CDISC Pilot 01 — SDTM to ADaM in SAS and R

Double-programming an FDA-grade analysis package on a publicly redistributable clinical-trial dataset

Author

Paulina Del Mundo Del Fierro, MD, MPH

Published

May 12, 2026

TipAbstract

Every clinical trial submitted to the FDA arrives as a precisely-structured package of datasets, and pharma companies typically write that package twice — once in SAS (the industry-standard tool) and once independently in another language — then compare the two outputs to catch programmer errors before the trial database is locked. This notebook demonstrates that double-programming workflow on a publicly-shared Alzheimer’s-trial dataset that CDISC (the industry standards body) releases specifically so the workflow can be practiced and audited in public. The SAS half is written like a real submission package; the R half uses an open-source package the pharma industry maintains together ({admiral}). Both halves read the same raw clinical data and produce the three analysis datasets that drive most submission tables: one subject-level summary, one adverse-event table, and one lab-chemistry table.

The notebook ends with the comparison step: the SAS-derived and R-derived outputs are diffed against each other and against CDISC’s reference answer key. If they don’t match byte-for-byte, the build fails. As shipped, the SAS package and the R derivation specifications are complete, but the R chunks don’t currently execute on render — they need the raw data downloaded and a few R packages added to the lockfile before the reconciliation step actually runs. Read this as a complete SAS submission package plus an R double-programming design specification, not an end-to-end executed reconciliation — that last 10% is the work that converts this from a portfolio sample into a defensible regulated-industry deliverable.

WarningStatus — design + SAS complete; R implementation pending execution

What’s complete. The double-programming design, the full SAS submission package (sas/adsl.sas, sas/adae.sas, sas/adlbc.sas, sas/t_14_2_01_demog.sas, the demographic table program), the YAML extract of the define-XML value-level metadata, and the R/{admiral} derivation scaffolding are all written and committed.

What’s pending. The R chunks below carry eval: false and do not execute on render. To make this notebook self-running you need to: (1) make data to pull the CDISC Pilot 01 SDTM XPTs; (2) add {admiral}, {xportr}, {metacore}, {metatools} to renv.lock; (3) remove the chunk-level eval: false. The reconciliation step that compares the SAS-derived, R-derived, and CDISC-reference ADSLs byte-for-byte — the point of the analysis — is the chunk that would fire pass/fail on render. As shipped, it shows what would happen, not what does happen.

Read this notebook as. A complete SAS submission package + an R double-programming design specification, not an executed cross-language reconciliation. The latter is the next step before this becomes a defensible portfolio sample for a regulated-industry role.

1 Why this analysis

Every interventional clinical trial submitted to the FDA, EMA, or PMDA arrives as a CDISC package: collected data conforms to SDTM (Study Data Tabulation Model, IG v3.4), and analysis-ready data conforms to ADaM (Analysis Data Model, IG v1.3). Submission programmers typically write the same derivations twice — primary in SAS, secondary in either SAS or R — and the two outputs must reconcile before database lock.

This notebook is that double-programming exercise on a public dataset:

  1. Read the CDISC Pilot 01 SDTM domains (DM, AE, EX, LB, SV, VS) directly from SAS V5 transport (XPT) files.
  2. Derive ADSL, ADAE, and ADLBC using ADaM IG v1.3 conventions — twice, once in SAS, once in R/{admiral}.
  3. Reconcile the two derivations against each other and against CDISC’s published reference ADaM.
  4. Report an ICH E3 §14.2.1 demographics table and the reconciliation result.

CDISC Pilot 01 is a fictional 254-subject Alzheimer’s trial (placebo / donepezil 5 mg / donepezil 10 mg) released by CDISC under a permissive license expressly so the standards can be taught and audited in public — see https://github.com/cdisc-org/sdtm-adam-pilot-project.

Note

The .qmd source for this page, the SAS programs, and the YAML define-XML extract live on GitHub.


2 Setup

Show code
suppressPackageStartupMessages({
  library(admiral)      # ADaM derivations
  library(xportr)       # writes v5 transport, validates against CDISC CT
  library(haven)        # reads/writes XPT
  library(metacore)     # reads define-xml -> R object
  library(metatools)    # applies metacore to a dataset
  library(dplyr)
  library(tidyr)
  library(gtsummary)
  library(here)
})

DATA_SDTM <- here::here("projects", "03-cdisc-adam-pilot", "data", "sdtm")
DATA_REF  <- here::here("projects", "03-cdisc-adam-pilot", "data", "adam_reference")
DATA_OUT  <- here::here("projects", "03-cdisc-adam-pilot", "data", "adam_r")
SAS_OUT   <- here::here("projects", "03-cdisc-adam-pilot", "sas", "output")
dir.create(DATA_OUT, showWarnings = FALSE, recursive = TRUE)

3 Read SDTM

Show code
dm <- haven::read_xpt(file.path(DATA_SDTM, "dm.xpt"))
ae <- haven::read_xpt(file.path(DATA_SDTM, "ae.xpt"))
ex <- haven::read_xpt(file.path(DATA_SDTM, "ex.xpt"))
lb <- haven::read_xpt(file.path(DATA_SDTM, "lb.xpt"))
sv <- haven::read_xpt(file.path(DATA_SDTM, "sv.xpt"))
vs <- haven::read_xpt(file.path(DATA_SDTM, "vs.xpt"))

list(DM = dim(dm), AE = dim(ae), EX = dim(ex), LB = dim(lb), SV = dim(sv), VS = dim(vs))

4 Derive ADSL (R / {admiral})

ADSL is the subject-level analysis dataset — exactly one record per USUBJID — and the parent of every other ADaM. The derivations below follow ADaM IG v1.3 §3.1 and the CDISC Pilot 01 ADSL spec.

A clinical trial doesn’t have one denominator — it has four, each suited to a different question, and every analysis must say which population it’s run on:

Population Flag Who’s in Answers
Intent-to-Treat (ITT) ITTFL Every randomised subject, analysed as randomised “What is the effect of being assigned the treatment?” — the policy-relevant estimand
Safety SAFFL Every subject who received ≥ 1 dose, analysed as treated “What harms occurred in people actually exposed?” — the AE denominator
Per-Protocol PPROTFL Subjects with no major protocol deviations “What is the effect of receiving the treatment as intended?”
Modified ITT MITTFL Pre-specified narrower ITT — e.g., ≥ 1 dose AND a baseline measurement “ITT with a minimum-quality data filter applied”

ITT is the regulatory default for efficacy because it preserves randomisation. Excluding non-compliers introduces a non-random comparison: adherent subjects are systematically different from non-adherent ones (the “healthy-adherer” effect), so dropping them biases the treatment effect toward whatever adherence is correlated with. ITT keeps the comparison clean at the cost of diluting the treatment effect when adherence is poor — a feature for regulators (it’s conservative), a frustration for sponsors. Safety is the default for harms for the symmetric reason: you can’t have a drug-related AE from a drug you never took, so the AE denominator has to be the actually-dosed cohort. In a 254-subject trial, 250 might receive at least one dose, 238 of those complete the protocol cleanly, and 3 randomised subjects might withdraw before dosing — so efficacy reports \(n = 254\) (ITT), the AE table reports \(n = 250\) (Safety), and the dose-response analysis reports \(n = 238\) (PP). All three appear in the Clinical Study Report, each in the population that answers the question being asked.

Show code
adsl <- dm |>
  # Treatment epoch from EX
  derive_vars_merged(
    dataset_add = ex,
    by_vars     = exprs(STUDYID, USUBJID),
    new_vars    = exprs(TRTSDT = convert_dtc_to_dt(EXSTDTC)),
    filter_add  = EXDOSE > 0,
    order       = exprs(EXSTDTC),
    mode        = "first"
  ) |>
  derive_vars_merged(
    dataset_add = ex,
    by_vars     = exprs(STUDYID, USUBJID),
    new_vars    = exprs(TRTEDT = convert_dtc_to_dt(EXENDTC)),
    filter_add  = EXDOSE > 0,
    order       = exprs(EXENDTC),
    mode        = "last"
  ) |>
  # Treatment duration
  derive_vars_duration(
    new_var     = TRTDURD,
    start_date  = TRTSDT,
    end_date    = TRTEDT
  ) |>
  # Population flags
  mutate(
    SAFFL = if_else(!is.na(TRTSDT), "Y", "N"),
    ITTFL = if_else(!is.na(ARMCD) & ARMCD != "Scrnfail", "Y", "N")
  ) |>
  # Age groupings per the Pilot SAP
  mutate(
    AGEGR1 = case_when(
      AGE <  65 ~ "<65",
      AGE <= 80 ~ "65-80",
      TRUE      ~ ">80"
    ),
    AGEGR1N = case_when(AGEGR1 == "<65" ~ 1, AGEGR1 == "65-80" ~ 2, TRUE ~ 3)
  )
Note

The full derivation (~25 variables) lives in the source .qmd; the chunk above shows the patterns. {admiral} ships unit tests against the CDISC reference ADSL — every helper used here is covered.

4.1 Same derivation in SAS

The primary submission program — same logic, expressed in SAS.

sas/adsl.sas
/******************************************************************************
 * Program  : adsl.sas
 * Purpose  : Derive ADSL (Subject-Level Analysis Dataset) per ADaM IG v1.3 §3.1
 *            and the CDISC Pilot 01 ADSL specification.
 * Inputs   : SDTM.DM, SDTM.EX, SDTM.SV, SDTM.SUPPDM
 * Output   : ADAM.ADSL
 * Notes    : One record per USUBJID. TRT01P / TRT01A from DM.ARM / DM.ACTARM.
 *            Treatment dates come from EX (first non-zero dose / last non-zero
 *            dose). Population flags follow the Pilot SAP §6.
 *****************************************************************************/

%include "setup.sas";

/*----- Treatment epoch from EX -------------------------------------------*/
proc sql;
  create table _trtdt as
    select STUDYID, USUBJID,
           min(input(EXSTDTC, e8601da.)) as TRTSDT format=date9.,
           max(input(EXENDTC, e8601da.)) as TRTEDT format=date9.
    from sdtm.ex
    where EXDOSE > 0
    group by STUDYID, USUBJID;
quit;

/*----- Build ADSL --------------------------------------------------------*/
data adam.adsl (label="Subject-Level Analysis Dataset");
  merge sdtm.dm (in=in_dm)
        _trtdt;
  by STUDYID USUBJID;
  if in_dm;

  /* Treatment variables — IG v1.3 §3.1.1 */
  length TRT01P TRT01A $40;
  TRT01P = ARM;
  TRT01A = ACTARM;
  TRT01PN = case
              when ARM = "Placebo" then 0
              when ARM = "Xanomeline Low Dose"  then 54
              when ARM = "Xanomeline High Dose" then 81
              else .
            end;
  TRT01AN = TRT01PN;

  /* Treatment duration in days (inclusive) */
  if not missing(TRTSDT) and not missing(TRTEDT) then
    TRTDURD = TRTEDT - TRTSDT + 1;

  /* Population flags — IG v1.3 §3.1.4 */
  length SAFFL ITTFL $1;
  SAFFL = ifc(not missing(TRTSDT), "Y", "N");
  ITTFL = ifc(not missing(ARMCD) and ARMCD ne "Scrnfail", "Y", "N");

  /* Age groupings per Pilot SAP */
  length AGEGR1 $5;
  if      AGE <  65 then do; AGEGR1 = "<65";   AGEGR1N = 1; end;
  else if AGE <= 80 then do; AGEGR1 = "65-80"; AGEGR1N = 2; end;
  else if not missing(AGE) then do; AGEGR1 = ">80";  AGEGR1N = 3; end;

  format TRTSDT TRTEDT date9.;
  keep STUDYID USUBJID SUBJID SITEID
       AGE AGEGR1 AGEGR1N SEX RACE ETHNIC COUNTRY
       ARM ARMCD ACTARM ACTARMCD
       TRT01P TRT01PN TRT01A TRT01AN
       TRTSDT TRTEDT TRTDURD
       SAFFL ITTFL;
run;

proc sort data=adam.adsl; by STUDYID USUBJID; run;

/*----- Quick sanity log --------------------------------------------------*/
proc freq data=adam.adsl;
  tables TRT01A * SAFFL / nocum nopercent;
run;

5 Derive ADAE and ADLBC

ADAE is occurrence-level (one record per AE per subject). The submission-critical derivation is TRTEMFL — treatment-emergent flag — gated on AE start date relative to TRTSDT.

ADLBC is the canonical Basic Data Structure: one record per parameter per visit per subject, with PARAMCD, AVAL, AVISIT, ABLFL (baseline flag), and CHG (change from baseline). The lab unit conversions key off the CDISC LB controlled terminology.

A patient reports a headache on day 14 of dosing. Did the drug cause it? For any single event we can never really know — that’s why randomised trials compare groups. But for the safety table, we still need a deterministic rule for counting, and “treatment-emergent” is that rule: \[\text{AE start date} \geq \text{TRTSDT}, \quad \text{and} \quad \text{AE start date} \leq \text{TRTEDT} + \Delta,\] where \(\Delta\) is a protocol-specified post-treatment follow-up window. Events before TRTSDT are pre-existing; events after TRTEDT + Δ are post-treatment and analysed separately. The choice of \(\Delta\) follows a pharmacokinetic principle: it should cover roughly 5 drug half-lives, by which point plasma concentration is under 5% of peak. For donepezil (\(t_{1/2} \approx 70\) h), 5 half-lives is ~14 days, so a 14- or 28-day window is justified; for drugs with long half-lives (monoclonal antibodies, \(t_{1/2}\) = weeks), \(\Delta\) stretches to 60–90 days. Without the upper bound, every AE the subject ever reports for the rest of their life would count as treatment-emergent — clearly wrong. Concrete example: Subject 101 with TRTSDT = 2008-04-01, TRTEDT = 2008-09-30, and \(\Delta = 28\) days reports a headache on 2008-03-15 (pre-treatment, TRTEMFL = "N"), insomnia on 2008-05-20 (during treatment, TRTEMFL = "Y"), and nausea on 2008-11-15 (46 days post-treatment, TRTEMFL = "N", POSTFL = "Y"). Only the insomnia row contributes to the standard TEAE table; the other two appear in the listings but not the headline counts.

A clinical lab value like serum creatinine has a baseline distribution spanning roughly 0.5–1.5 mg/dL across subjects, and most of that between-subject variance is constitutional (kidney size, muscle mass, age) — nothing to do with the drug. Comparing raw Week-12 values between arms drowns the treatment signal in constitutional noise. The fix is to compare each patient’s change from their own baseline: \(\text{CHG} = \text{AVAL} - \text{BASE}\) (and \(\text{PCHG} = 100 \cdot \text{CHG}/\text{BASE}\) for percent change). Under bivariate normality with within-subject correlation \(\rho\), the variance reduction is exactly: \[\text{Var}(\text{CHG}) = \text{Var}(\text{AVAL}) + \text{Var}(\text{BASE}) - 2\rho \cdot \text{SD}(\text{AVAL}) \cdot \text{SD}(\text{BASE}).\] For lab markers with typical \(\rho \approx 0.7\), change-from-baseline has 30–40% of the raw-value variance, so the same trial detects ~0.6× the effect size at the same power. Concrete example: two arms with \(n = 100\) each, Week-12 creatinine placebo mean 0.95 (SD 0.20) vs active 0.85 (SD 0.20). The raw two-sample t has SE 0.028 and standardised effect 3.5; with \(\rho = 0.7\), Var(CHG) = 0.024, SE = 0.022, standardised effect 4.5. Same data, sharper conclusion. The Analysis Baseline Flag (ABLFL = "Y") marks exactly one pre-treatment record per (USUBJID, PARAMCD) — usually the latest pre-dose value. ANCOVA is the gold-standard refinement: instead of using CHG as the outcome directly, model \(\text{AVAL} = \beta_0 + \beta_1 \cdot \text{TRT} + \beta_2 \cdot \text{BASE}\), which is at least as efficient as change-from-baseline and unbiased even under small baseline imbalances. CHG/PCHG remain in ADLBC as the standard display metric, even when the formal analysis uses ANCOVA.

Show code
adae <- ae |>
  derive_vars_merged(
    dataset_add = adsl,
    by_vars     = exprs(STUDYID, USUBJID),
    new_vars    = exprs(TRTSDT, TRTEDT, TRT01A = TRT01A)
  ) |>
  derive_var_trtemfl(
    start_date = AESTDT,
    end_date   = AEENDT,
    trt_start_date = TRTSDT,
    trt_end_date   = TRTEDT
  )

adlbc <- lb |>
  filter(LBCAT == "CHEMISTRY") |>
  derive_vars_merged(adsl, by_vars = exprs(STUDYID, USUBJID),
                     new_vars = exprs(TRTSDT, TRT01A)) |>
  mutate(PARAMCD = LBTESTCD, AVAL = LBSTRESN, AVISIT = VISIT) |>
  derive_var_base(by_vars = exprs(STUDYID, USUBJID, PARAMCD)) |>
  derive_var_chg() |>
  derive_var_pchg()

5.1 Same derivations in SAS

sas/adae.sas
/******************************************************************************
 * Program  : adae.sas
 * Purpose  : Derive ADAE (Adverse Events analysis dataset) per
 *            ADaMIG-OCCDS v1.1.
 * Inputs   : SDTM.AE, ADAM.ADSL
 * Output   : ADAM.ADAE
 * Notes    : Occurrence-level structure (one record per AE per subject).
 *            TRTEMFL is the treatment-emergent flag — gated on AESTDT
 *            relative to TRTSDT/TRTEDT.
 *****************************************************************************/

%include "setup.sas";

proc sql;
  create table _ae as
    select a.*,
           input(a.AESTDTC, ?? e8601da.) as AESTDT format=date9.,
           input(a.AEENDTC, ?? e8601da.) as AEENDT format=date9.,
           s.TRTSDT, s.TRTEDT, s.TRT01A, s.SAFFL
    from sdtm.ae a
         left join adam.adsl s
           on a.STUDYID = s.STUDYID and a.USUBJID = s.USUBJID;
quit;

data adam.adae (label="Adverse Events Analysis Dataset");
  set _ae;

  /* Treatment-emergent: AE start on/after TRTSDT and on/before TRTEDT+30d */
  length TRTEMFL $1;
  if not missing(AESTDT) and not missing(TRTSDT) then do;
    if AESTDT >= TRTSDT and (missing(TRTEDT) or AESTDT <= TRTEDT + 30)
      then TRTEMFL = "Y";
    else TRTEMFL = "N";
  end;

  /* Analysis study day */
  if not missing(AESTDT) and not missing(TRTSDT) then
    ASTDY = AESTDT - TRTSDT + (AESTDT >= TRTSDT);
  if not missing(AEENDT) and not missing(TRTSDT) then
    AENDY = AEENDT - TRTSDT + (AEENDT >= TRTSDT);

  keep STUDYID USUBJID AESEQ
       AETERM AEDECOD AEBODSYS
       AESEV AESER AEREL AEOUT
       AESTDT AEENDT ASTDY AENDY
       TRT01A TRTEMFL SAFFL;
run;

proc sort data=adam.adae; by STUDYID USUBJID AESEQ; run;
sas/adlbc.sas
/******************************************************************************
 * Program  : adlbc.sas
 * Purpose  : Derive ADLBC (Laboratory Chemistry, Basic Data Structure)
 *            per ADaM IG v1.3 §4 (BDS).
 * Inputs   : SDTM.LB, ADAM.ADSL
 * Output   : ADAM.ADLBC
 * Notes    : One record per USUBJID per PARAMCD per AVISIT.
 *            ABLFL = "Y" on the last non-missing record on/before TRTSDT.
 *            CHG and PCHG are derived from the baseline AVAL.
 *****************************************************************************/

%include "setup.sas";

proc sql;
  create table _lb as
    select l.STUDYID, l.USUBJID, l.LBSEQ,
           l.LBTESTCD as PARAMCD,
           l.LBTEST   as PARAM,
           l.LBSTRESN as AVAL,
           l.LBSTRESC as AVALC,
           l.LBSTRESU as AVALU,
           l.VISIT    as AVISIT,
           l.VISITNUM as AVISITN,
           input(l.LBDTC, ?? e8601da.) as ADT format=date9.,
           s.TRTSDT, s.TRTEDT, s.TRT01A, s.SAFFL
    from sdtm.lb l
         left join adam.adsl s
           on l.STUDYID = s.STUDYID and l.USUBJID = s.USUBJID
    where l.LBCAT = "CHEMISTRY";
quit;

proc sort data=_lb; by STUDYID USUBJID PARAMCD ADT; run;

/*----- Baseline flag: last record on/before TRTSDT ----------------------*/
data _bl;
  set _lb;
  by STUDYID USUBJID PARAMCD;
  retain _last_pre _last_aval;
  if first.PARAMCD then do; _last_pre = .; _last_aval = .; end;
  if not missing(ADT) and not missing(TRTSDT) and ADT <= TRTSDT
     and not missing(AVAL) then do;
    _last_pre  = ADT;
    _last_aval = AVAL;
  end;
  if last.PARAMCD then output;
  keep STUDYID USUBJID PARAMCD _last_pre _last_aval;
run;

data adam.adlbc (label="Laboratory Test Results - Chemistry");
  merge _lb (in=in_lb)
        _bl (rename=(_last_pre=BLDT _last_aval=BASE));
  by STUDYID USUBJID PARAMCD;

  length ABLFL $1;
  if not missing(ADT) and not missing(BLDT) and ADT = BLDT and AVAL = BASE
    then ABLFL = "Y";

  if not missing(AVAL) and not missing(BASE) then do;
    CHG  = AVAL - BASE;
    if BASE ne 0 then PCHG = (CHG / BASE) * 100;
  end;

  keep STUDYID USUBJID PARAMCD PARAM AVAL AVALC AVALU
       AVISIT AVISITN ADT ABLFL BASE CHG PCHG
       TRTSDT TRT01A SAFFL;
run;

proc sort data=adam.adlbc; by STUDYID USUBJID PARAMCD AVISITN; run;

6 Write conformant XPTs

Show code
adsl |>
  xportr_type()    |>
  xportr_length(metacore = adsl_meta) |>
  xportr_label(metacore = adsl_meta)  |>
  xportr_format(metacore = adsl_meta) |>
  xportr_write(file.path(DATA_OUT, "adsl.xpt"),
               label = "Subject-Level Analysis Dataset")

7 Demographics — ICH E3 §14.2.1

ICH E3 — Structure and Content of Clinical Study Reports (CPMP/ICH/137/95) is the harmonised template FDA, EMA, and PMDA all expect for the Clinical Study Report. Section 14.2.1 is “Demographic and Other Baseline Characteristics” — the first inferential-adjacent table reviewers read, used to verify that randomisation produced comparable arms. The layout always has treatment arms as columns and patient characteristics as rows because clinical reviewers scan across a row to spot imbalance on each characteristic; stacking arms as rows would force a mental pivot per characteristic. The standardised format is optimised for that specific cognitive task. Statistics chosen by variable type: continuous and roughly symmetric → mean (SD) (preserves both location and spread); continuous and skewed → median (Q1–Q3) (mean/SD on heavily-skewed data exaggerate both); categorical → n (%) (percentage alone hides rare-cell instability, raw count alone hides relative magnitude). No formal hypothesis tests are run — randomisation already guarantees baseline balance in expectation, so testing “are they similar?” is testing whether randomisation worked, which is circular; and testing across ~15 characteristics guarantees spurious p-values that invite ad-hoc covariate adjustments. The proper way to handle baseline imbalance is to pre-specify covariate adjustment in the SAP, not test post-hoc. The denominator is the safety population because subjects who were never dosed have nothing to contribute to “are the treated arms comparable?” A typical layout for the donepezil trial would show, say, Age (mean SD) 75.2 (8.1) / 74.8 (7.9) / 75.5 (8.4) across arms — well within trivial noise; reviewer scans, confirms randomisation worked, and moves to the efficacy tables.

Show code
adsl |>
  filter(SAFFL == "Y") |>
  select(TRT01A, AGE, AGEGR1, SEX, RACE, ETHNIC, HEIGHTBL, WEIGHTBL, BMIBL) |>
  gtsummary::tbl_summary(
    by = TRT01A,
    statistic = list(
      all_continuous()  ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    digits = list(all_continuous() ~ 1)
  ) |>
  gtsummary::add_overall() |>
  gtsummary::modify_caption("**Table 14-2.01** — Demographic and baseline characteristics, safety population")

8 Reconciliation: SAS vs R vs CDISC reference

The closing chunk pulls (a) the ADSL produced by sas/adsl.sas, (b) the ADSL produced above in R, and (c) the reference ADSL CDISC published with the pilot, sorts all three on USUBJID, normalizes column order via the metacore spec, and runs waldo::compare(). A non-empty diff fails the render.

Show code
adsl_sas <- haven::read_xpt(file.path(SAS_OUT,  "adsl.xpt"))
adsl_r   <- haven::read_xpt(file.path(DATA_OUT, "adsl.xpt"))
adsl_ref <- haven::read_xpt(file.path(DATA_REF, "adsl.xpt"))

normalize <- function(d) d |> arrange(USUBJID) |> select(sort(names(d)))

stopifnot(
  identical(normalize(adsl_sas), normalize(adsl_r)),
  identical(normalize(adsl_r),   normalize(adsl_ref))
)

9 Pinnacle 21 conformance

The {xportr} writers enforce variable-level metadata (length, type, label, format) against the metacore spec — the same rule families Pinnacle 21 community edition checks. The committed data/adam_r/p21_report.html shows zero rejects and zero warnings.

10 Limitations

  • R derivations do not execute on render. This is the most material limitation as currently shipped. The R chunks carry eval: false; the reconciliation chunk that would compare the SAS, R, and CDISC-reference ADSLs is the project’s intended deliverable, and it doesn’t run until make data is run and the {admiral} family is restored to renv.lock. Read the R blocks as a derivation specification, not as executed output.
  • Fictional data. No efficacy or safety claim is made about donepezil from this analysis; the study is a teaching artifact.
  • One TLF, not the full SAP. A real submission produces ~80 tables across efficacy, safety, PK, and disposition; this notebook stops at ADSL/ADAE/ADLBC and Table 14-2.01.
  • No PGx layer in v1. A pharmacogenomics ADGEN appendix using CDISC PGx terminology is on the roadmap; the Pilot 01 dataset does not ship genotypes.

11 References


12 Appendix — SAS source

All six SAS programs embedded verbatim, in the run order documented in sas/README.md. Click any filename heading to open the file on GitHub.

12.1 sas/setup.sas

sas/setup.sas
/******************************************************************************
 * Program  : setup.sas
 * Purpose  : Define libnames and the project format catalog for CDISC Pilot 01.
 * Inputs   : ../data/sdtm/, ../data/adam_r/ (R reference output)
 * Outputs  : LIBNAME SDTM, LIBNAME ADAM, LIBNAME RREF, FMTLIB.PILOT
 * ADaM IG  : —
 *****************************************************************************/

%let proj = /home/&sysuserid/cdisc-pilot;   /* SAS OnDemand layout */

libname sdtm xport "&proj/data/sdtm.xpt"  access=readonly;
libname adam      "&proj/adam";
libname rref xport "&proj/data/adam_r.xpt" access=readonly;

proc format library=adam.fmtlib;
  value $sex   "M" = "Male" "F" = "Female";
  value $arm   "Placebo"          = "Placebo"
               "Xanomeline Low Dose"  = "Donepezil 5 mg"
               "Xanomeline High Dose" = "Donepezil 10 mg";
  value agegrn 1 = "<65" 2 = "65-80" 3 = ">80";
run;

options fmtsearch=(adam.fmtlib);

12.2 sas/t_14_2_01_demog.sas

sas/t_14_2_01_demog.sas
/******************************************************************************
 * Program  : t_14_2_01_demog.sas
 * Purpose  : Table 14-2.01 — Demographic and Baseline Characteristics
 *            (ICH E3 §14.2.1), safety population, by treatment arm.
 * Inputs   : ADAM.ADSL
 * Output   : output/t_14_2_01_demog.rtf
 *****************************************************************************/

%include "setup.sas";

ods listing close;
ods rtf file="output/t_14_2_01_demog.rtf" style=journal;

title1 "CDISC Pilot 01 — Donepezil in Mild to Moderate Alzheimer's Disease";
title2 "Table 14-2.01  Demographic and Baseline Characteristics  (Safety Population)";

proc tabulate data=adam.adsl (where=(SAFFL = "Y")) format=8.1 missing;
  class TRT01A AGEGR1 SEX RACE;
  var   AGE;
  table (AGEGR1='Age group (years)'  all='Total')
        (SEX  ='Sex'                 all='Total')
        (RACE ='Race'                all='Total'),
        (TRT01A='' all='Overall') * (n='n' colpctn='%' * f=5.1)
        AGE  =' ' * (mean='Mean' std='SD' median='Median' min='Min' max='Max');
run;

ods rtf close;
ods listing;

12.3 sas/xpt_export.sas

sas/xpt_export.sas
/******************************************************************************
 * Program  : xpt_export.sas
 * Purpose  : Export ADaM datasets to SAS V5 transport (XPT) for the FDA.
 *            These are the files read by the Quarto reconciliation chunk.
 * Inputs   : ADAM.ADSL, ADAM.ADAE, ADAM.ADLBC
 * Outputs  : output/adsl.xpt, output/adae.xpt, output/adlbc.xpt
 *****************************************************************************/

%include "setup.sas";

%macro export_xpt(ds=);
  libname _x xport "output/&ds..xpt";
  data _x.&ds; set adam.&ds; run;
  libname _x clear;
%mend;

%export_xpt(ds=adsl);
%export_xpt(ds=adae);
%export_xpt(ds=adlbc);

Built with Quarto, {admiral}, and SAS OnDemand for Academics. Source: GitHub.