Title: | Principal Stratification Analysis in R |
---|---|
Description: | Estimating causal effects in the presence of post-treatment confounding using principal stratification. 'PStrata' allows for customized monotonicity assumptions and exclusion restriction assumptions, with automatic full Bayesian inference supported by 'Stan'. The main function to use in this package is PStrata(), which provides posterior estimates of principal causal effect with uncertainty quantification. Visualization tools are also provided for diagnosis and interpretation. See Liu and Li (2023) <arXiv:2304.02740> for details. |
Authors: | Bo Liu [aut, cre], Fan Li [ctb] |
Maintainer: | Bo Liu <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.0.3 |
Built: | 2024-10-29 03:16:19 UTC |
Source: | https://github.com/laubok/pstrata |
The PStrata package is designed for estimating causal effects in the presense of post-treatment confounding using principal stratification.
It provides an interface to fit the Bayesian principal stratification model, which is a complex mixture model, using Stan, a C++ package for obtaining full Bayesian inference.
The formula syntax is an extended version of the syntax applied in many regression functions and packages, such as lm
, glm
and lme4, to provide a simple interface.
A wide variety of distributions and link functions are supported, allowing users to fit linear, binary or count data, and survival models with principal stratification.
Further modeling options include multiple post-treatment confounding variables and cluster random effects.
The monotonicity and exclusion restriction assumptions can be easily applied, and
prior specifications are flexible and encourage users to reflect their prior belief.
In addition, all parameters can be inferred from the posterior distribution, which enables further analysis other than provided by the package.
A frequentist weighting-based triply-robust estimator is also implemented for both ordinary outcomes and survival outcomes.
The Bayesian principal stratification analysis relies on two models, the principal stratum model and the outcome model.
The main function of PStrata is PStrata
, which uses formula syntax to specify these models.
Based on the supplied formulas, data and additional information allowing users to specify assumptions and prior distributions,
it automatically generates the Stan code via make_stancode
and make_standata
, and fits the model using Stan.
The estimated probability for each principal stratum and the estimated mean response are calculated with Stan as it is faster and more space-efficient.
However, a large number of post-processing methods can also be applied.
summary
is perfectly suited for an overview of the estimated parameters,
and plot
provides visualization of the principal stratification and the outcome distribution.
Because PStrata heavily relies on Stan for posterior sampling, a C++ compiler is required. The program Rtools (available on https://cran.r-project.org/bin/windows/Rtools/) comes with a C++ compiler for Windows. On Mac, Xcode is suggested. For further instructions on how to get the compilers running, please refer to the prerequisites section at the RStan-Getting-Started page.
The Stan Development Team. Stan Modeling Language User's Guide and Reference Manual. https://mc-stan.org/users/documentation/
Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.21.2. https://mc-stan.org/
Generate the Stan code corresponding to the model, which is read by Stan to do sampling.
make_stancode(PSobject, filename = NULL, debug = FALSE)
make_stancode(PSobject, filename = NULL, debug = FALSE)
PSobject |
an object of class |
filename |
(optional) string. If not |
debug |
only for testing in development mode. Will be removed in future release. |
A string, which can be printed on screen using cat
.
Generate data for PStrata models to be passed to Stan
make_standata(PSobject)
make_standata(PSobject)
PSobject |
an object of class |
a named list of objects containing the required data to fit a PStrata model with Stan.
Define prior functions used in PStrata
.
prior_flat() prior_normal(mu = 0, sigma = 1) prior_t(mu = 0, sigma = 1, df = 1) prior_cauchy(mu = 0, sigma = 1) prior_lasso(mu = 0, sigma = 1) prior_logistic(mu = 0, sigma = 1) prior_chisq(df = 1) prior_inv_chisq(df = 1) prior_exponential(beta = 1) prior_gamma(alpha = 1, beta = 1) prior_inv_gamma(alpha = 1, beta = 1) prior_weibull(alpha = 1, sigma = 1)
prior_flat() prior_normal(mu = 0, sigma = 1) prior_t(mu = 0, sigma = 1, df = 1) prior_cauchy(mu = 0, sigma = 1) prior_lasso(mu = 0, sigma = 1) prior_logistic(mu = 0, sigma = 1) prior_chisq(df = 1) prior_inv_chisq(df = 1) prior_exponential(beta = 1) prior_gamma(alpha = 1, beta = 1) prior_inv_gamma(alpha = 1, beta = 1) prior_weibull(alpha = 1, sigma = 1)
mu , sigma , df , alpha , beta
|
parameters for the prior distribution |
A list, including the following items.
name of the distribution
type of the distribution, one character string of "real" or "positive"
a named list of all the input parameters
a function call object of the prior distribution on the parameters
Create an object that represents contrast of potential outcomes by treatment arms, strata or time points.
PSContrast( outcome, S = NULL, Z = NULL, T = NULL, type = c("all", "sequential", "cycle") )
PSContrast( outcome, S = NULL, Z = NULL, T = NULL, type = c("all", "sequential", "cycle") )
outcome |
an object of class |
S |
a vector denoting which strata to take contrasts. Default is |
Z |
a vector denoting which treatment arms to take contrasts. Default is |
T |
a vector denoting which time points to take contrasts. Default is |
type |
Either |
An S3 object of class PSContrast
and PSOutcome
, containing
outcome_array |
A num_strata * num_treatment * num_iter array of contrast if the outcome type is non-survival or a num_strata * num_treatment * num_time_points * num_iter array of contrast if the outcome type is survival. |
is.survival |
A boolean value, whether the outcome type is survival. |
time_points |
The time points at which the outcome is evaluated, if the outcome type is survival. |
The S3 method summary
and plot
can be applied to the returned object.
Set up a model formula for use in PStrata package allowing users to specify the treatment indicator, the post-randomization confounding variables, the outcome variable, and possibly the covariates. For survival outcome, a censoring indicator is also specified. Users can also define (potentially non-linear) transforms of the covariates and include random effects for clusters.
PSFormula(formula, data)
PSFormula(formula, data)
formula |
an object of class |
data |
a data frame containing the variables named in |
Two models are required for the principal stratification analysis: the principal stratum model and the outcome model.
For the principal stratum model, the formula
argument accepts formulas of the following syntax:
treatment + postrand ~ terms
The treatment
variable refers to the name of the binary treatment indicator.
The postrand
variable refers to the name of the binary post-randomization confounding variable.
The terms
part includes all of the predictors used for the principal stratum model.
For the outcome model, the formula
argument accepts formulas of the similar syntax:
response [+ observed] ~ terms
The response
variable refers to the name of the outcome variable.
The terms
part includes all of the predictors used for the outcome model.
The observed
variable shall not be used for ordinary response.
When the true response is subject to right censoring (also called survival outcome in relevant literature),
the response
variable should refer to the observed or censored response, and the observed
variable should
be an indicator of whether the true response is observed.
For example, suppose the true time for an event is and the time of censoring is
,
Then, the
response
variable should refer to , the actual time of the event or censoring, whichever comes earlier,
and the indicator
observed
is 1 if and 0 otherwise.
The terms
specified in the principal stratum model and the outcome model can be different.
If multiple post-randomization confounding variables exist, one can specify all of them using the following syntax:
treatment + postrand_1 + postrand_2 + ... + postrand_n ~ terms
The post-randomization confounding variables are provided in place of postrand_1
to
postrand_n
. Up to this version, all of these variables should be binary indicators.
Note that the order of these post-randomization confounding variables will not
affect the result of the estimation of the parameters, but it will be important
in specifying other parameters, such as strata
and ER
(see PStrata
).
The syntax for the predictors follow the conventions as used in link{formula}
.
The part terms
consists of a series of terms concatenated by +
,
each term being the name of a variable, or the interaction of several variables separated by :
.
Apart from +
and :
, a number of other operators are also useful.
The *
operator is a short-hand for factor crossing:
a*b
is interpreted as a + b + a:b
.
The ^
operator means factor crossing to a specific degree. For example,
(a + b + c)^2
is interpreted as (a + b + c) * (a + b + c)
,
which is identical to a + b + c + a:b + a:c + b:c
.
The -
operator removes specified terms, so that (a + b + c)^2 - a:b
is
identical to a + b + c + a:c + b:c
.
The -
operator can be also used to remove the intercept term, such as
x - 1
. One can also use x + 0
to remove the intercept term.
Arithmetic expressions such as a + log(b)
are also legal.
However, arithmetic expressions may contain special symbols that are defined for other use, such as +
, *
, ^
and -
.
To avoid confusion, the function I()
can be used to bracket portions where the operators should be interpreted in arithmetic sense.
For example, in x + I(y + z)
, the term y + z
is interpreted as the sum of y
and z
.
When effects assumed to vary across grouping variables are considered, one can
specify such effects by adding terms in the form of gterms | group
, where
group
refers to the group indicator (usually a factor
), and
gterms
specifies the terms whose coefficients are group-specific, drawn
from a population normal distribution.
The most common situation for group level random effect is to include group-specific
intercepts to account for unmeasured confounding.
For example, x + y + (1 | g)
specifies a model with population predictors
x
and y
, as well as random intercept for each level of g
.
For more complex random effect structures, refer to lme4::lmer
.
However, structures other than simple random intercepts and slopes may lead to unexpected behaviors.
PSFormula
returns an object of class PSFormula
,
which is a list
containing for following components.
full_formula
input formula as is
data
input data frame
fixed_eff_formula
input formula with only fixed effects
response_names
character vector with names of variables that appear on the left hand side of input formula
has_random_effect
logical indicating whether random effects are specified in the input formula
has_intercept
logical indicating whether the input formula has an intercept
fixed_eff_names
character vector with names of all variables included as fixed effects
fixed_eff_count
integer indicating the number of variables (factors are converted to and counted as dummy variables)
fixed_eff_matrix
fixed-effect design matrix
random_eff_list
a list containing information for each random effect. Such information is a list with the corresponding design matrix, the term names and the factor levels.
df <- data.frame( X = 1:10, Z = c(0,0,0,0,0,1,1,1,1,1), D = c(0,0,0,1,1,1,0,0,1,1), R = c(1,1,1,1,2,2,2,3,3,3) ) PSFormula(Z + D ~ X + I(X^2) + (1 | R), df)
df <- data.frame( X = 1:10, Z = c(0,0,0,0,0,1,1,1,1,1), D = c(0,0,0,1,1,1,0,0,1,1), R = c(1,1,1,1,2,2,2,3,3,3) ) PSFormula(Z + D ~ X + I(X^2) + (1 | R), df)
Create an object containing essential information to create the Stan file and data for Stan to draw posterior samples. Such information includes the specified model for principal stratum and outcome, the type of outcome, assumptions, and prior specification, etc.
PSObject( S.formula, Y.formula, Y.family, data = NULL, strata = NULL, ER = NULL, prior_intercept = prior_flat(), prior_coefficient = prior_normal(), prior_sigma = prior_inv_gamma(), prior_alpha = prior_inv_gamma(), prior_lambda = prior_inv_gamma(), prior_theta = prior_normal(), survival.time.points = 50 )
PSObject( S.formula, Y.formula, Y.family, data = NULL, strata = NULL, ER = NULL, prior_intercept = prior_flat(), prior_coefficient = prior_normal(), prior_sigma = prior_inv_gamma(), prior_alpha = prior_inv_gamma(), prior_lambda = prior_inv_gamma(), prior_theta = prior_normal(), survival.time.points = 50 )
S.formula , Y.formula
|
an object of class " |
Y.family |
an object of class " |
data |
(optional) a data frame object. This is required when either
|
strata , ER
|
arguments to define the principal strata. See Alternatively, one can pass an object of class |
prior_intercept , prior_coefficient , prior_sigma , prior_alpha , prior_lambda , prior_theta
|
prior distribution for corresponding parameters in the model. |
survival.time.points |
a vector of time points at which the estimated survival probability is evaluated (only used when the type of outcome is survival), or an integer specifying the number of time points to be chosen. By default, the time points are chosen with equal distance from 0 to the 90% quantile of the observed outcome. |
The supported family
objects include two types: native families for ordinary outcome and
survival
family for survival outcome.
For ordinary outcome, the below families and links are supported. See family
for more details.
family | link |
binomial |
logit , probit , cauchit , log , cloglog |
gaussian |
identity , log , inverse |
Gamma |
inverse , identity , log |
poisson |
log , identity , log |
inverse.gamma |
1/mu^2 , inverse , identity , log
|
The quasi
family is not supported for the current version of the package.
For survival outcome, the family
object is created by
survival(method = "Cox", link = "identity")
, where method
can be
either "Cox"
for Weibull-Cox model or "AFT"
for accelerated
failure time model. See survival
for more details. For the current
version, only "identity"
is used as the link function.
The gaussian
family and the survival
family with method = "AFT"
introduce an additional parameter sigma
for the standard deviation, whose
prior distribution is specified by prior_sigma
. Similarly, prior_alpha
specifies the prior distribution of alpha
for Gamma
family,
prior_lambda
specifies the prior distribution of theta
for inverse.gaussian
family,
and prior_theta
specifies the prior distribution of theta
for survival
family with method = "Cox"
.
The models for principal stratum S.formula
and response Y.formula
also involve a linear combination of terms, where the prior distribution of
the intercept and coefficients are specified by prior_intercept
and
prior_coefficient
respectively.
A list, containing important information describing the principal stratification model.
S.formula , Y.formula
|
A |
Y.family |
Same as input. |
is.survival |
A boolean value. |
strata_info |
A |
prior_intercept , prior_coefficient , prior_sigma , prior_alpha , prior_lambda , prior_theta
|
Same as input. |
survival.time.points |
A list of time points at which the estimated survival probability is evaluated. |
SZDG_table |
A matrix. Each row corresponds to a valid (stratum, treatment, confounder, group) combination. |
Z_names |
A character vector. The names of the levels of the treatment. |
df <- data.frame( Z = rbinom(10, 1, 0.5), D = rbinom(10, 1, 0.5), Y = rnorm(10), X = 1:10 ) PSObject( S.formula = Z + D ~ X, Y.formula = Y ~ X, Y.family = gaussian("identity"), data = df, strata = c(n = "00*", c = "01", a = "11*") ) #------------------------------ PSObject( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian("identity"), data = sim_data_normal, strata = c(n = "00*", c = "01", a = "11*") )
df <- data.frame( Z = rbinom(10, 1, 0.5), D = rbinom(10, 1, 0.5), Y = rnorm(10), X = 1:10 ) PSObject( S.formula = Z + D ~ X, Y.formula = Y ~ X, Y.family = gaussian("identity"), data = df, strata = c(n = "00*", c = "01", a = "11*") ) #------------------------------ PSObject( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian("identity"), data = sim_data_normal, strata = c(n = "00*", c = "01", a = "11*") )
Create an object useful to present the potential outcomes under each treatment arm for each principal stratum. Contrasts between treatment arms or principal strata are easy to obtain from this object.
PSOutcome(PStrataObj, type = c("survival_prob", "RACE"))
PSOutcome(PStrataObj, type = c("survival_prob", "RACE"))
PStrataObj |
an object of class |
An S3 object of type PSOutcome
, containing
outcome_array |
A num_strata * num_treatment * num_iter array of mean outcome if the outcome type is non-survival or a num_strata * num_treatment * num_time_points * num_iter array of mean outcome if the outcome type is survival. |
is.survival |
A boolean value, whether the outcome type is survival. |
time_points |
The time points at which the outcome is evaluated, if the outcome type is survival. |
The S3 method summary
and plot
can be applied to the returned object.
Stan
Sample from the posterior distribution by calling stan
.
Check stan
for details of the arguments.
PSSample( file, model_name = "anon_model", model_code = "", fit = NA, data = list(), pars = NA, chains = 4, iter = 2000, warmup = floor(iter/2), thin = 1, init = "random", seed = sample.int(.Machine$integer.max, 1), algorithm = c("NUTS", "HMC", "Fixed_param"), control = NULL, sample_file = NULL, diagnostic_file = NULL, save_dso = TRUE, verbose = FALSE, include = TRUE, cores = getOption("mc.cores", 1L), open_progress = interactive() && !isatty(stdout()) && !identical(Sys.getenv("RSTUDIO"), "1"), ..., boost_lib = NULL, eigen_lib = NULL )
PSSample( file, model_name = "anon_model", model_code = "", fit = NA, data = list(), pars = NA, chains = 4, iter = 2000, warmup = floor(iter/2), thin = 1, init = "random", seed = sample.int(.Machine$integer.max, 1), algorithm = c("NUTS", "HMC", "Fixed_param"), control = NULL, sample_file = NULL, diagnostic_file = NULL, save_dso = TRUE, verbose = FALSE, include = TRUE, cores = getOption("mc.cores", 1L), open_progress = interactive() && !isatty(stdout()) && !identical(Sys.getenv("RSTUDIO"), "1"), ..., boost_lib = NULL, eigen_lib = NULL )
file |
The path to the Stan program to use.
A model may also be specified directly as a character string using the
The |
model_name |
A string to use as the name of the model; defaults
to |
model_code |
A character string either containing the model definition or the name of
a character string object in the workspace. This argument is used only
if arguments |
fit |
An instance of S4 class |
data |
A named |
pars |
A character vector specifying parameters of interest to be saved.
The default is to save all parameters from the model.
If |
chains |
A positive integer specifying the number of Markov chains. The default is 4. |
iter |
A positive integer specifying the number of iterations for each chain (including warmup). The default is 2000. |
warmup |
A positive integer specifying the number of warmup (aka burnin)
iterations per chain. If step-size adaptation is on (which it is by default),
this also controls the number of iterations for which adaptation is run (and
hence these warmup samples should not be used for inference). The number of
warmup iterations should be smaller than |
thin |
A positive integer specifying the period for saving samples. The default is 1, which is usually the recommended value. Unless your posterior distribution takes up too much memory we do not recommend thinning as it throws away information. The tradition of thinning when running MCMC stems primarily from the use of samplers that require a large number of iterations to achieve the desired effective sample size. Because of the efficiency (effective samples per second) of Hamiltonian Monte Carlo, rarely should this be necessary when using Stan. |
init |
Specification of initial values for all or some parameters.
Can be the digit
When specifying initial values via a |
seed |
The seed for random number generation. The default is generated
from 1 to the maximum integer supported by R on the machine. Even if
multiple chains are used, only one seed is needed, with other chains having
seeds derived from that of the first chain to avoid dependent samples.
When a seed is specified by a number, Using R's |
algorithm |
One of the sampling algorithms that are implemented in Stan.
The default and preferred algorithm is |
control |
A named
In addition, algorithm HMC (called 'static HMC' in Stan) and NUTS share the following parameters:
For algorithm NUTS, we can also set:
For algorithm HMC, we can also set:
For
|
sample_file |
An optional character string providing the name of a file.
If specified the draws for all parameters and other saved quantities
will be written to the file. If not provided, files are not created.
When the folder specified is not writable, |
diagnostic_file |
An optional character string providing the name of a file.
If specified the diagnostics data for all parameters will be written
to the file. If not provided, files are not created. When the folder specified
is not writable, |
save_dso |
Logical, with default |
verbose |
|
include |
Logical scalar defaulting to |
cores |
The number of cores to use when executing the Markov chains in parallel.
The default is to use the value of the |
open_progress |
Logical scalar that only takes effect if
|
... |
Other optional parameters:
Deprecated:
|
boost_lib |
The path for an alternative version of the Boost C++ to use instead of the one in the BH package. |
eigen_lib |
The path for an alternative version of the Eigen C++ library to the one in RcppEigen. |
An object of S4 class rstan::stanfit
.
Perform pincipal stratification analysis when there are confounding variables after randomization
PStrata( PSobject = NULL, S.formula, Y.formula, Y.family, data = NULL, strata = NULL, ER = NULL, prior_intercept = prior_flat(), prior_coefficient = prior_normal(), prior_sigma = prior_inv_gamma(), prior_alpha = prior_inv_gamma(), prior_lambda = prior_inv_gamma(), prior_theta = prior_normal(), survival.time.points = 50, filename = NULL, ... )
PStrata( PSobject = NULL, S.formula, Y.formula, Y.family, data = NULL, strata = NULL, ER = NULL, prior_intercept = prior_flat(), prior_coefficient = prior_normal(), prior_sigma = prior_inv_gamma(), prior_alpha = prior_inv_gamma(), prior_lambda = prior_inv_gamma(), prior_theta = prior_normal(), survival.time.points = 50, filename = NULL, ... )
PSobject |
an object of class |
S.formula |
an object of class " |
Y.formula |
an object of class " |
Y.family |
an object of class " |
data |
(optional) a data frame object. This is required when either
|
strata |
arguments to define the principal strata. See Alternatively, one can pass an object of class |
ER |
arguments to define the principal strata. See Alternatively, one can pass an object of class |
prior_intercept |
prior distribution for corresponding parameters in the model. |
prior_coefficient |
prior distribution for corresponding parameters in the model. |
prior_sigma |
prior distribution for corresponding parameters in the model. |
prior_alpha |
prior distribution for corresponding parameters in the model. |
prior_lambda |
prior distribution for corresponding parameters in the model. |
prior_theta |
prior distribution for corresponding parameters in the model. |
survival.time.points |
a vector of time points at which the estimated survival probability is evaluated (only used when the type of outcome is survival), or an integer specifying the number of time points to be chosen. By default, the time points are chosen with equal distance from 0 to the 90% quantile of the observed outcome. |
filename |
(optional) string. If not |
... |
additional parameters to be passed into |
An object of class PStrata
or PStrata_survival
,
which is a list containing
PSobject |
An object of |
post_samples |
An object of class |
require(abind) PSobj <- PSObject( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian("identity"), data = sim_data_normal, strata = c(n = "00*", c = "01", a = "11*") ) PStrata(PSobj, cores = 2, chains = 2, iter = 200) # Another example for survival data PSobj <- PSObject( S.formula = Z + D ~ 1, Y.formula = Y + delta ~ 1, Y.family = survival("Cox"), data = sim_data_Cox, strata = c(`never-taker` = "00*", complier = "01", `always-taker` = "11*") ) PStrata(PSobj, cores = 2, chains = 2, iter = 200)
require(abind) PSobj <- PSObject( S.formula = Z + D ~ 1, Y.formula = Y ~ 1, Y.family = gaussian("identity"), data = sim_data_normal, strata = c(n = "00*", c = "01", a = "11*") ) PStrata(PSobj, cores = 2, chains = 2, iter = 200) # Another example for survival data PSobj <- PSObject( S.formula = Z + D ~ 1, Y.formula = Y + delta ~ 1, Y.family = survival("Cox"), data = sim_data_Cox, strata = c(`never-taker` = "00*", complier = "01", `always-taker` = "11*") ) PStrata(PSobj, cores = 2, chains = 2, iter = 200)
PStrataInfo
is a class of object that defines all principal strata
to be considered, by specifying the potential value of each post-randomization
confounding variable under each treatment arm.
PStrataInfo(strata, ER = NULL)
PStrataInfo(strata, ER = NULL)
strata |
a list or a vector defining all principal strata. Details of the syntax are given in 'Details' below. |
ER |
a vector indicating on which strata exclusion restriction is assumed. Details are given in 'Details' below. |
Since definition of the principal strata appears fundamental and essential in
principal stratification analyses, the creation of such an object is designed
to be user-friendly - various ways are accommodated to create a PStrataInfo
object, some possibly preferable over others under different settings.
There are mainly two ways to easily create a PStrataInfo
object.
To define the principal strata by strings, the strata
argument
should receive a named vector, each component being the description of one strata
with the name of that strata. The naming does not affect the actual inference,
but informative names can be helpful for users to distinguish among strata.
Each stratum is defined by the potential values of the post-randomization confounding
variable under each treatment arm. By convention, assume that the K treatment arms are
numbered from 0 to K-1. Then, each stratum is defined by the tuple
,
which can be written compactly as a string. For example, under binary treatment,
the never-takers (i.e.
) can be represented by string
"00"
and
the compliers (i.e. ) can be represented by string
"01"
.
Note that the value that the post-randomization confounding variable can take is limited between 0 to 9
for the string to be parsed correctly. This should be more than enough in most of the applications, and
in cases where a number above 10 is needed, please create the PStrataInfo
object by matrix (see below).
When multiple post-randomization confounding variables exist, the string for each confounding variable
is concatenated with the symbol "|
". For example, if and
are both binary
post-randomization confounding variables, the stratum defined by
can be represented by string
"00|11"
. The order of these confounding variables should be the same
as they appear in the S.formula
parameter in PSObject
.
A common assumption in practice is the exclusion restriction (ER) assumption, which assumes that
the causal effect of the treatment on the outcome is totally realized through the post-randomization
confounding variables. For example, the ER assumption on the stratum of never-takers can be interpreted
as the outcome is identically distributed across the treated and control group, because all causal effect
of the treatment is realized through the post-randomization variable, which is the same (0) under both
treatment arms. To assume ER for some stratum, simply put an asterisk "*
" at the end of the string,
such as "00*" for the never-taker stratum. Note that under the context of multiple post-randomization
variables, the package treats all such variables as a unity. The outcome is assumed to be identical under
different treatment arms only when all post-randomization variables remain the same under these treatment arms.
Another way to specify the stratum where ER is assumed is to use the ER
argument. It either takes
a logical vector of the same length of strata
with TRUE
indicating ER is assumed and FALSE
otherwise, or takes a character vector with the names of all strata where ER is to be assumed upon.
When names to the strata are not provided in strata
, the strata can be referred to by their
canonical name, which is the string used to define the stratum with asterisks removed. For example,
the strata "00|11*"
can be referred to with name "00|11".
To define the principal strata by matrices, the strata
argument
should receive a named list, each component being a matrix. The number of rows matches the number
of post-randomization variables, and the number of columns matches that of possible treatment arms.
For any fixed row , column
stores the potential value of the
-th post-randomization
variable under treatment arm
.
When this approach is used, there is no shorthand to specify ER assumption. The ER
argument is
required to do this.
Warning: When ER assumption is specified in both strata
and ER
argument, the shorthand
notation for ER in strata
is ignored, and a warning is given regardless of whether the specification
given by strata
and ER
actually match.
an object of class PSStrataInfo
, which is a list of the following components.
number of principal strata defined
number of treatment arms
number of post-randomization variables
integer vector, the biggest number used by each post-randomization variable
integer matrix, each row corresponding to one stratum and each column corresponding to one treatment arm. The matrix is designed only for internal use.
logical vector, each component corresponding to one stratum, indicating whether ER is assumed for the specific stratum
character vector, the names of all strata
PStrataInfo(strata = c(n = "00*", c = "01", a = "11")) PStrataInfo( strata = list(n = c(0, 0), c = c(0, 1), a = c(1, 1)), ER = c(TRUE, FALSE, FALSE) ) PStrataInfo( strata = list(n = c(0, 0), c = c(0, 1), a = c(1, 1)), ER = c("n") )
PStrataInfo(strata = c(n = "00*", c = "01", a = "11")) PStrataInfo( strata = list(n = c(0, 0), c = c(0, 1), a = c(1, 1)), ER = c(TRUE, FALSE, FALSE) ) PStrataInfo( strata = list(n = c(0, 0), c = c(0, 1), a = c(1, 1)), ER = c("n") )
A dataset generated for illustration of the principal stratification analysis. This dataset represents the common case of non-compliance.
sim_data_Cox
sim_data_Cox
## 'sim_data_Cox' A data frame with 1,000 rows and 7 columns:
Principal Strata: "never taker", "complier" or "always taker"
Randomized treatment arm: 0 = control, 1 = treatment
Actual treatment arm: 0 = control, 1 = treatment
True outcome: event time
Censor time
Event indicator. 1 means true outcome is observed; 0 means otherwise
The observed event time or censor time
The dataset represents the scenario where actual treatment might not be in compliance
with the randomized (assigned) treatment. Defiers and always-takers are ruled out, leaving two
strata, "never-taker" and "complier" randomly sampled with
probability 0.3, 0.7 respectively. The assigned treatment is randomized
with 0.5 probability for either arm. The true event time
is given by the following
Weibull-Cox distribution
and the censor time is uniformly drawn between 0.5 and 2.
The exclusion restriction assumption holds for never-takers in this generated dataset.
A dataset generated for illustration of the principal stratification analysis. This dataset represents the common case of non-compliance.
sim_data_normal
sim_data_normal
## 'sim_data_normal' A data frame with 1,000 rows and 4 columns:
Principal Strata: "never taker", "complier" or "always taker"
Randomized treatment arm: 0 = control, 1 = treatment
Actual treatment arm: 0 = control, 1 = treatment
Outcome
The dataset represents the scenario where actual treatment might not be in compliance
with the randomized (assigned) treatment. Defiers are ruled out, leaving three
strata, "never taker", "complier" and "always taker" randomly sampled with
probability 0.3, 0.2 and 0.5 respectively. The assigned treatment is randomized
with 0.5 probability for either arm. The outcome
is given by the following.
The exclusion restriction assumption holds for never takers and always takers in this generated dataset.
Construct a family object for survival data
survival(method = "Cox", link = "identity")
survival(method = "Cox", link = "identity")
method |
the parametric method used for survival data. Can be Cox or AFT. |
link |
a link function, currently only identity is implemented and used |
A family
object