Projects & case studies
Reproducible analyses of publicly available health datasets — NHANES, CMS Medicaid and Medicare, CDC WONDER, SEER, openFDA — written in a mix of R, Python, and SAS, rendered with Quarto so the full code, outputs, and figures sit on a single page. Every project links to its GitHub source.
-
01
Read
Cardiometabolic Risk in NHANES 2017–2018 Live
A polyglot walk through complex survey design, weighted prevalence, design-aware logistic regression, and gradient-boosted risk prediction with SHAP interpretability — on the same cohort, in the same document.
-
02
Read
Outlier Detection in Medicaid Provider Spending Live
Stream the 11 GB HHS Medicaid Provider Spending CSV with DuckDB, join NPPES + NUCC + Census ZCTA for specialty and geography, and compare a MAD-based peer-group z-score to an isolation forest on the top spending HCPCS codes — plus an interactive leaflet map of the flagged providers.
-
03
Read
CDISC Pilot 01 — SDTM to ADaM in SAS and R In progress
Double-program an FDA-grade analysis package on the publicly redistributable CDISC Pilot 01 Alzheimer's trial — derive ADSL, ADAE, and ADLBC from SDTM twice (once in SAS, once in R/
{admiral}), reconcile against CDISC's reference XPTs byte-for-byte, and render an ICH E3 Table 14-2.01. -
04
Read
The $35 Insulin Cap and Medicare Part D In progress
A difference-in-differences evaluation of the Inflation Reduction Act's $35/month Part D insulin out-of-pocket cap (effective 2023-01-01), using the CMS Medicare Part D Prescribers public use file. Two-way fixed effects with non-insulin antihyperglycemics as the control group, an event-study, parallel-trends F-test, placebo cap years, and prescriber-FE robustness on a DuckDB-streamed panel.
-
05Coming soon
openFDA Pharmacovigilance Signal Detection Planned
Disproportionality analysis (PRR, ROR, IC) on FAERS adverse-event reports, aggregated by RxNorm drug class — with an interactive signal explorer.
-
06Coming soon
TCGA Somatic Mutations & Survival Planned
Integrate TCGA somatic mutation data with clinical outcomes for a solid-tumor cohort; fit Cox proportional hazards and differential expression; stratify Kaplan–Meier curves by mutation status.