Portfolio
Worked examples of the kind of methodological thinking I bring to client engagements.
The pieces below are end-to-end analyses I’ve written publicly. Each project notebook holds the executable code, data, and figures; the methodological decisions each one turns on are taught in the Handbook, linked from each project below.
The portfolio is organized by methodological surface rather than topic. The skills listed on each card are the kinds of decisions a client would hire me to help with.
Population epidemiology and prediction modeling
Cardiometabolic risk in NHANES 2015–2018. A polyglot R + Python analysis of metabolic-syndrome prevalence and risk in US adults, using NHANES 2015–2018, multiple imputation, design-aware logistic regression, gradient-boosted classification with SHAP, and a Pooled Cohort Equations head-to-head comparator.
Skills demonstrated. Survey-weighted analysis with complex sample design; multiple imputation by chained equations (MICE) with Rubin’s-rules pooling; calibration over discrimination for clinical portability; head-to-head model comparison with structural-overlap diagnostics; equity-of-measurement framing for case-definition choice; documentation of distributional diagnostics as infrastructure for downstream modelers.
Project notebook → · Methods in the Handbook: populations & sample size, sensitivity & robustness
Real-world data and outlier detection at scale
Outlier detection in Medicaid provider spending. A laptop-scale analysis of the HHS Medicaid Provider Spending file (11 GB CSV) using DuckDB streaming aggregation, MAD-based robust z-scores with BH-FDR multiplicity control, an isolation-forest second opinion, and a county-level interactive cost atlas across all 12 HCPCS spending categories.
Skills demonstrated. Robust statistics on heavy-tailed data; Benjamini-Hochberg FDR control with explicit calibration caveats (mis-specified null, pre-filtered families); isolation-forest unsupervised anomaly detection with collinearity diagnostics; peer-group construction as the load-bearing methodological choice; ontology-aware claims-data analysis (HCPCS vs ICD); spatial concentration measurement (Gini, top-N share); data-engineering pipeline completeness audits.
Project notebook → · Methods in the Handbook: sensitivity & robustness, Monte Carlo simulation
Clinical-trial data standards
CDISC SDTM/ADaM pilot. A pilot implementation working with CDISC clinical-trial data standards, including SDTM (Study Data Tabulation Model) domain mappings and ADaM (Analysis Data Model) analysis-ready datasets.
Skills demonstrated. Clinical-trial data architecture; CDISC SDTM and ADaM conventions; regulatory data preparation for FDA submissions; SAP-style methods writing oriented to clinical-trial reporting.
Project notebook → · Methods in the Handbook: from research question to study design
Causal inference and policy evaluation
The $35 insulin cap in Medicare Part D. A difference-in-differences evaluation of the Inflation Reduction Act’s Section 11406 out-of-pocket cap on insulin, using six annual releases of the CMS Medicare Part D Prescribers public-use file. Two-way fixed-effects DiD with non-insulin antihyperglycemics as the control group, an event-study to test parallel pre-trends, placebo and leave-one-out robustness, a prescriber-FE specification on the larger panel, and heterogeneity stratified by prescriber specialty.
Skills demonstrated. Difference-in-differences design with simultaneous treatment; two-way fixed-effects estimation and the conditions under which it is unbiased (versus the staggered-adoption literature); parallel-trends defense via event-study and F-test on leads; placebo testing as a design-defense move; control-group composition sensitivity (dropping GLP-1 RAs to absorb concurrent demand shocks); fixed-effects specification choices and how they change the identifying question; heterogeneity analysis by clinical subgroup; cluster-robust inference at the drug level.
Project notebook → · Methods in the Handbook: causal inference toolkit, sensitivity & robustness
Methodology frameworks
A separate strand of work develops methodological frameworks at the boundary of evidence synthesis and clinical AI evaluation. These are public drafts intended to be redlined.
Risk-of-bias appraisal for AI training corpora (v0.1). Adapting Cochrane RoB 2 / ROBINS-I logic to the text an LLM was actually trained on. Six bias domains with inline signaling questions and a stylized worked example end-to-end.
Additional frameworks (GRADE for AI-synthesized claims, PRISMA-style reporting checklist for clinical AI as evidence synthesizer) are in progress and will publish as v0.1 drafts when ready.