| Title: | Principal Stratification Analysis in R |
|---|---|
| Description: | Estimating causal effects in the presence of post-treatment confounding using principal stratification. 'PStrata' allows for customized monotonicity assumptions and exclusion restriction assumptions, with automatic full Bayesian inference supported by 'Stan'. The main workflow is PStrataModel() to specify the model, fit() to run MCMC sampling, estimate() to extract potential outcomes, and contrast() to compute causal effects. Visualization tools are provided for diagnosis and interpretation. See Liu and Li (2023) <doi:10.48550/arXiv.2304.02740> for details. |
| Authors: | Bo Liu [aut, cre], Fan Li [ctb] |
| Maintainer: | Bo Liu <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.0.2 |
| Built: | 2026-05-17 18:18:30 UTC |
| Source: | https://github.com/laubok/pstrata |
Compute pairwise differences of posterior potential outcomes along strata, treatment arms, or time points.
contrast(object, ...) ## S3 method for class 'PSEstimate' contrast( object, S = NULL, Z = NULL, T = NULL, type = c("all", "sequential", "cycle"), ... ) ## S3 method for class 'PStrataFit' contrast( object, S = NULL, Z = NULL, T = NULL, type = c("all", "sequential", "cycle"), ... )contrast(object, ...) ## S3 method for class 'PSEstimate' contrast( object, S = NULL, Z = NULL, T = NULL, type = c("all", "sequential", "cycle"), ... ) ## S3 method for class 'PStrataFit' contrast( object, S = NULL, Z = NULL, T = NULL, type = c("all", "sequential", "cycle"), ... )
object |
A |
... |
Additional arguments passed to |
S |
Strata to contrast. |
Z |
Treatment arms to contrast. |
T |
Time points to contrast (survival only). |
type |
|
A PSContrast object (inherits from PSEstimate).
data(sim_data_normal) model <- PStrataModel( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian(), strata = c(n = "00", c = "01", a = "11"), ER = c("n", "a") ) ps_fit <- fit(model, data = sim_data_normal, chains = 2, iter = 500) # Treatment effect (Z contrast) for each stratum ctr_z <- contrast(ps_fit, Z = TRUE) summary(ctr_z) plot(ctr_z) # Stratum contrasts for each treatment arm ctr_s <- contrast(ps_fit, S = TRUE) summary(ctr_s)data(sim_data_normal) model <- PStrataModel( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian(), strata = c(n = "00", c = "01", a = "11"), ER = c("n", "a") ) ps_fit <- fit(model, data = sim_data_normal, chains = 2, iter = 500) # Treatment effect (Z contrast) for each stratum ctr_z <- contrast(ps_fit, Z = TRUE) summary(ctr_z) plot(ctr_z) # Stratum contrasts for each treatment arm ctr_s <- contrast(ps_fit, S = TRUE) summary(ctr_s)
Reports the rank-normalized split-Rhat together with the bulk and tail
effective sample sizes (Vehtari et al., 2021), as computed by
monitor.
diagnostics(object, ...) ## S3 method for class 'PStrataFit' diagnostics(object, pars = NULL, ...)diagnostics(object, ...) ## S3 method for class 'PStrataFit' diagnostics(object, pars = NULL, ...)
object |
A |
... |
Currently unused. |
pars |
Character vector of parameter names. If NULL, key parameters are selected automatically. |
Invisibly returns the rstan::monitor summary.
Maps posterior samples of outcome-group means back to the (stratum x treatment) grid via the SZDG table.
estimate(object, ...) ## S3 method for class 'PStrataFit' estimate(object, type = c("probability", "RACE"), ...)estimate(object, ...) ## S3 method for class 'PStrataFit' estimate(object, type = c("probability", "RACE"), ...)
object |
A |
... |
Additional arguments (unused). |
type |
For survival models: |
A PSEstimate object containing
outcome_array |
[S, Z, iter] array for non-survival, or [S, Z, T, iter] for survival. |
is_survival |
Logical. |
time_points |
Numeric vector (survival only). |
model_info |
List of strata names, treatment info, family, ER flags, outcome groups. |
data(sim_data_normal) model <- PStrataModel( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian(), strata = c(n = "00", c = "01", a = "11"), ER = c("n", "a") ) ps_fit <- fit(model, data = sim_data_normal, chains = 2, iter = 500) est <- estimate(ps_fit) summary(est) plot(est)data(sim_data_normal) model <- PStrataModel( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian(), strata = c(n = "00", c = "01", a = "11"), ER = c("n", "a") ) ps_fit <- fit(model, data = sim_data_normal, chains = 2, iter = 500) est <- estimate(ps_fit) summary(est) plot(est)
Estimate a principal stratification model by running MCMC via Stan.
The model specification (a PStrataModel) is combined with
observed data to produce posterior samples.
fit(model, ...) ## S3 method for class 'PStrataModel' fit( model, data, chains = 4, iter = 2000, warmup = floor(iter/2), cores = 1, seed = NULL, .debug = FALSE, ... )fit(model, ...) ## S3 method for class 'PStrataModel' fit( model, data, chains = 4, iter = 2000, warmup = floor(iter/2), cores = 1, seed = NULL, .debug = FALSE, ... )
model |
A |
... |
Additional arguments passed to |
data |
A data frame containing all variables referenced in the model. |
chains, iter, warmup, cores, seed
|
MCMC settings passed to
|
.debug |
Logical. If TRUE, read Stan helper files from |
An object of class PStrataFit
(or PStrataFitSurvival).
data(sim_data_normal) model <- PStrataModel( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian(), strata = c(n = "00", c = "01", a = "11"), ER = c("n", "a") ) ps_fit <- fit(model, data = sim_data_normal, chains = 2, iter = 500) summary(ps_fit) diagnostics(ps_fit) plot(ps_fit) cat(ps_fit$stancode)data(sim_data_normal) model <- PStrataModel( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian(), strata = c(n = "00", c = "01", a = "11"), ER = c("n", "a") ) ps_fit <- fit(model, data = sim_data_normal, chains = 2, iter = 500) summary(ps_fit) diagnostics(ps_fit) plot(ps_fit) cat(ps_fit$stancode)
Generate the Stan code corresponding to the model.
This is called internally by fit; use
stancode.PStrataFit to retrieve the code
from a fitted model.
make_stancode(object, filename = NULL, debug = FALSE)make_stancode(object, filename = NULL, debug = FALSE)
object |
An internal model specification object. |
filename |
(optional) string. If not |
debug |
only for development/testing use. |
A string containing the Stan program.
data(sim_data_normal) model <- PStrataModel( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian(), strata = c(n = "00", c = "01", a = "11"), ER = c("n", "a") ) ps_fit <- fit(model, data = sim_data_normal, chains = 2, iter = 500) cat(ps_fit$stancode)data(sim_data_normal) model <- PStrataModel( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian(), strata = c(n = "00", c = "01", a = "11"), ER = c("n", "a") ) ps_fit <- fit(model, data = sim_data_normal, chains = 2, iter = 500) cat(ps_fit$stancode)
By default, shows violin plots of posterior strata proportions
and potential outcomes (non-survival).
Use what = "trace" or what = "dens" for MCMC diagnostics.
## S3 method for class 'PStrataFit' plot(x, what = c("default", "trace", "dens"), pars = "strata_prob", ...)## S3 method for class 'PStrataFit' plot(x, what = c("default", "trace", "dens"), pars = "strata_prob", ...)
x |
A |
what |
|
pars |
Parameter names for trace/dens plots (ignored for default). |
... |
Additional arguments passed to plotting functions. |
A ggplot object.
Define prior functions used in PStrata.
prior_flat() prior_normal(mu = 0, sigma = 1) prior_t(mu = 0, sigma = 1, df = 1) prior_cauchy(mu = 0, sigma = 1) prior_lasso(mu = 0, sigma = 1) prior_logistic(mu = 0, sigma = 1) prior_chisq(df = 1) prior_inv_chisq(df = 1) prior_exponential(beta = 1) prior_gamma(alpha = 1, beta = 1) prior_inv_gamma(alpha = 1, beta = 1) prior_weibull(alpha = 1, sigma = 1)prior_flat() prior_normal(mu = 0, sigma = 1) prior_t(mu = 0, sigma = 1, df = 1) prior_cauchy(mu = 0, sigma = 1) prior_lasso(mu = 0, sigma = 1) prior_logistic(mu = 0, sigma = 1) prior_chisq(df = 1) prior_inv_chisq(df = 1) prior_exponential(beta = 1) prior_gamma(alpha = 1, beta = 1) prior_inv_gamma(alpha = 1, beta = 1) prior_weibull(alpha = 1, sigma = 1)
mu, sigma, df, alpha, beta
|
parameters for the prior distribution |
A list, including the following items.
name of the distribution
type of the distribution, one character string of "real" or "positive"
a named list of all the input parameters
a function call object of the prior distribution on the parameters
prior_flat() prior_normal(0, 10) prior_t(0, 2.5, df = 3) prior_cauchy(0, 5) prior_inv_gamma(2, 1)prior_flat() prior_normal(0, 10) prior_t(0, 2.5, df = 3) prior_cauchy(0, 5) prior_inv_gamma(2, 1)
The PStrata package is designed for estimating causal effects in the presence of post-treatment confounding using principal stratification.
It provides an interface to fit the Bayesian principal stratification model, which is a complex mixture model, using Stan, a C++ package for obtaining full Bayesian inference.
The formula syntax is an extended version of the syntax applied in many regression functions and packages, such as lm, glm and lme4-package, to provide a simple interface.
A wide variety of distributions and link functions are supported, allowing users to fit linear, binary or count data, and survival models with principal stratification.
Further modeling options include multiple post-treatment confounding variables and cluster random effects.
The monotonicity and exclusion restriction assumptions can be easily applied, and
prior specifications are flexible and encourage users to reflect their prior belief.
In addition, all parameters can be inferred from the posterior distribution, which enables further analysis other than provided by the package.
The Bayesian principal stratification analysis relies on two models, the principal stratum model and the outcome model. The main workflow is:
PStrataModel: specify the model (formulas, strata, priors).
fit: compile and run MCMC sampling via Stan.
estimate: extract posterior potential outcomes.
contrast: compute causal effect contrasts.
Based on the supplied formulas, data and additional information,
it automatically generates the Stan code via make_stancode and fits the model using Stan.
The estimated probability for each principal stratum and the estimated mean response are calculated with Stan as it is faster and more space-efficient.
Post-processing methods summary and plot provide overviews and visualization.
Because PStrata heavily relies on Stan for posterior sampling, a C++ compiler is required. The program Rtools (available on https://cran.r-project.org/bin/windows/Rtools/) comes with a C++ compiler for Windows. On Mac, Xcode is suggested. For further instructions on how to get the compilers running, please refer to the prerequisites section at the RStan-Getting-Started page.
The Stan Development Team. Stan Modeling Language User's Guide and Reference Manual. https://mc-stan.org/users/documentation/
Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.21.2. https://mc-stan.org/
PStrataModel, fit, estimate, contrast
Creates a model specification for principal stratification analysis.
No data is required at this stage – the specification is purely symbolic.
Use fit to estimate the model with data.
PStrataModel( S.formula, Y.formula, Y.family, strata, ER = NULL, prior_intercept = prior_flat(), prior_coefficient = prior_normal(), prior_sigma = prior_inv_gamma(), prior_alpha = prior_inv_gamma(), prior_lambda = prior_inv_gamma(), prior_theta = prior_normal(), survival.time.points = 50 )PStrataModel( S.formula, Y.formula, Y.family, strata, ER = NULL, prior_intercept = prior_flat(), prior_coefficient = prior_normal(), prior_sigma = prior_inv_gamma(), prior_alpha = prior_inv_gamma(), prior_lambda = prior_inv_gamma(), prior_theta = prior_normal(), survival.time.points = 50 )
S.formula |
formula for the stratum model (e.g., |
Y.formula |
formula for the outcome model (e.g., |
Y.family |
a |
strata |
strata definition: character vector, list of vectors, or list of lists. |
ER |
exclusion restriction: character vector of strata names or logical vector. |
prior_intercept, prior_coefficient, prior_sigma, prior_alpha, prior_lambda, prior_theta
|
prior distributions for model parameters. |
survival.time.points |
number of time points for survival outcomes. |
An object of class PStrataModel.
# Non-compliance with three strata (never-taker, complier, always-taker) model <- PStrataModel( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian(), strata = c(n = "00", c = "01", a = "11"), ER = c("n", "a") ) print(model) summary(model) # Fit the model (requires rstan and C++ compiler) data(sim_data_normal) ps_fit <- fit(model, data = sim_data_normal, chains = 2, iter = 500) summary(ps_fit) plot(ps_fit) # Extract potential outcomes and contrasts est <- estimate(ps_fit) summary(est) plot(est) ctr <- contrast(ps_fit) summary(ctr) plot(ctr)# Non-compliance with three strata (never-taker, complier, always-taker) model <- PStrataModel( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian(), strata = c(n = "00", c = "01", a = "11"), ER = c("n", "a") ) print(model) summary(model) # Fit the model (requires rstan and C++ compiler) data(sim_data_normal) ps_fit <- fit(model, data = sim_data_normal, chains = 2, iter = 500) summary(ps_fit) plot(ps_fit) # Extract potential outcomes and contrasts est <- estimate(ps_fit) summary(est) plot(est) ctr <- contrast(ps_fit) summary(ctr) plot(ctr)
A dataset generated for illustration of the principal stratification analysis. This dataset represents the common case of non-compliance.
sim_data_Coxsim_data_Cox
## 'sim_data_Cox' A data frame with 1,000 rows and 7 columns:
Principal Strata: "never taker", "complier" or "always taker"
Randomized treatment arm: 0 = control, 1 = treatment
Actual treatment arm: 0 = control, 1 = treatment
True outcome: event time
Censor time
Event indicator. 1 means true outcome is observed; 0 means otherwise
The observed event time or censor time
The dataset represents the scenario where actual treatment might not be in compliance
with the randomized (assigned) treatment. Defiers and always-takers are ruled out, leaving two
strata, "never-taker" and "complier" randomly sampled with
probability 0.3, 0.7 respectively. The assigned treatment is randomized
with 0.5 probability for either arm. The true event time is given by the following
Weibull-Cox distribution
and the censor time is uniformly drawn between 0.5 and 2.
The exclusion restriction assumption holds for never-takers in this generated dataset.
data(sim_data_Cox) head(sim_data_Cox) table(sim_data_Cox$D, sim_data_Cox$Z)data(sim_data_Cox) head(sim_data_Cox) table(sim_data_Cox$D, sim_data_Cox$Z)
A dataset generated for illustration of the principal stratification analysis. This dataset represents the common case of non-compliance.
sim_data_normalsim_data_normal
## 'sim_data_normal' A data frame with 1,000 rows and 4 columns:
Principal Strata: "never taker", "complier" or "always taker"
Randomized treatment arm: 0 = control, 1 = treatment
Actual treatment arm: 0 = control, 1 = treatment
Outcome
The dataset represents the scenario where actual treatment might not be in compliance
with the randomized (assigned) treatment. Defiers are ruled out, leaving three
strata, "never taker", "complier" and "always taker" randomly sampled with
probability 0.3, 0.2 and 0.5 respectively. The assigned treatment is randomized
with 0.5 probability for either arm. The outcome is given by the following.
The exclusion restriction assumption holds for never takers and always takers in this generated dataset.
data(sim_data_normal) head(sim_data_normal) table(sim_data_normal$D, sim_data_normal$Z)data(sim_data_normal) head(sim_data_normal) table(sim_data_normal$D, sim_data_normal$Z)
Extract the Generated Stan Code
stancode.PStrataFit(object, ...)stancode.PStrataFit(object, ...)
object |
A |
... |
Currently unused. |
The Stan code as a character string.
Construct a family object for survival data
survival(method = c("Cox", "AFT"), link = "identity")survival(method = c("Cox", "AFT"), link = "identity")
method |
the parametric method used for survival data. Can be Cox or AFT. |
link |
a link function, currently only identity is implemented and used |
A family object
survival("Cox") survival("AFT")survival("Cox") survival("AFT")