Generate survival data with predefined censoring rates for proportional hazards models
surv.simulate.Rd
surv.simulate
simulates one or multiple (right-censored) survival datasets for proportional hazards models by simultaneously incorporating a baseline hazard function from three different survival distributions (exponential, Weibull and Gompertz), a random censoring time generated from a uniform distribution with an known/unknown upper limit, and a set of baseline covariates.
When the upper limit of the uniform censoring time distribution is unknown, surv.simulate
can be used separately to obtain the upper limit with a predefined censoring rate.
Usage
surv.simulate(L = 1, Z, beta, a, b, u1 = 0, u2, cen_rate,
gen_data_from = c("exp", "weibul", "gomp"),
only_u2 = FALSE, n.rep = 100, Trace = FALSE)
Arguments
- L
the number of datasets to be generated. Default is
L = 1
.- Z
a list of
L
design matrices of dimension \(n_\ell \times p\), where \(n_\ell\) is the number of samples observed for the \(\ell^{\text{th}}\) dataset and \(p\) is the number of covariables. WhenL = 1
,Z
can be a matrix.- beta
the vector of the (true) coefficients values, with a length of \(p\) (the number of covariates).
- a
scale parameter, which should be non-negative. See ‘Details’ for the form of the cumulative hazard that can be used.
- b
shape/location parameter, which should be non-negative. It is not used when
gen_data_from =
“exp”. See ‘Details’ for the form of the cumulative hazard that can be used.- u1
a known non-negative lower limit of the uniform distribution for generating random censoring time. Default is
u1 = 0
. Ifcen_rate
is not equal to 0, thenu1
does not need to be defined.- u2
an non-negative upper limit of the uniform random censoring time distribution. The upper limit can be unknown (
u2 = NULL
, the default), or predefined. When this argument is assumed to be unknown,u2 = NULL
, it is calculated by the algorithm withinsurv.simulate()
. However, if the argumentu2
is known, the censoring rate cannot be predefined (meaning there is no control over it) and is calculated based on the generated dataset. See ‘Details’ and ‘References’.- cen_rate
a value representing the proportion of observations in the simulated survival data that are censored. The range of this argument is from 0 to 1. When the upper limit is known,
cen_rate
can nor be predefined. If there is no censoring (cen_rate = 0
), the lower (u1
) and upper (u2
) limits of the uniform distribution do not need to be specified.- gen_data_from
a description of the distribution from which the time to event is generated. This is a character string and can be
exponential
(“exp”),Weibull
(“weibul”), orGompertz
(“gomp”). Can be abbreviated. By default, theexponential
distribution is used.- only_u2
logical flag for calculating only the upper limit of the uniform censoring time distribution. If
only_u2 = TRUE
, the dataset(s) are not generated. Ifonly_u2 = TRUE
, the argumentsZ
andu2
do not need to be specified, andcen_rate
should not be set to \(0\). Default isonly_u2 = FALSE
.- n.rep
a scalar specifying the number of iterations. This argument is exclusively used in the case of the
Gompertz
distribution. Default is 100.- Trace
logical flag indicating whether the output of the desired
u2
and the censoring proportion for different datasets should be produced for each iteration. It worksgen_data_from =
“gomp”.
Details
surv.simulate
function generates \(L\) simulated right-censored survival datasets from exponential, Weibull, or Gompertz distributions, incorporating the covariates, Z
, distributed according to a multivariate normal
distribution, with censoring time generated from a uniform distribution Uniform(u1, u2)
, where u1
is known but u2
can be either known or unknown.
surv.simulate()
can also be used to calculate the unknown upper limit of the uniform distribution, u2
, with a predefined censoring rate. To do this, set u2 = NULL
and only_u2 = TRUE
. In this case, the datasets are not generated; only u2
is.
surv.simulate()
uses a root-finding algorithm to select the censoring parameter that achieves predefined censoring rates in the simulated survival data.
When gen_data_from =
“exp”:
the cumulative baseline hazard function is considered as \(\Lambda_0=a t\),
the event time for the \(\ell^{\text{th}}\) dataset, \(T_\ell\), is computed by \( - log(u) \ exp(- Z_\ell \boldsymbol{\beta}) / a\), where \(u\) follows a standard uniform distribution;
For gen_data_from =
“weibul”:
the cumulative hazard function is as \(\Lambda_0=a t ^ b\),
the event time is computed by \(T_\ell= (- log(u) \ exp(- Z_\ell \boldsymbol{\beta}) / a)^{1/b}\), where \(u\) follows a standard uniform distribution;
For gen_data_from =
“gomp”:
the cumulative hazard function is as \(\Lambda_0=a (exp(b t) - 1) / b\),
the event time is computed by \(T_\ell= \log(1- log(u) \ exp(- Z_\ell \boldsymbol{\beta}) b / a) / b\), where \(u\) follows a standard uniform distribution;
Finally the survival time is obtained by \(\tilde{T}_\ell=\min\{T_\ell , C_\ell \}\).
The function will be updated for gen_data_from =
“gomp”.
Value
surv.simulate
returns a list containing the following components:
- D
a list of \(L\) data frames, with dimension \(n_\ell \times (p+2)\). The first and second columns, named
time
andstatus
, contain the simulated survival time and the censoring indicator, respectively, where \(0\) means censored and \(1\) means uncensored;- censor_propor
the vector of censoring proportions in the simulated datasets
D
, containing \(L\) elements;- u1
the lower limit of the uniform distribution used to generate random censoring times with a predefined censoring rate. Sometimes this output is less than the value entered by the user, as it is adjusted to achieve the desired amount of censoring rate;
- u2
the upper limit of the uniform distribution used to generate random censoring times. If
u2 = NULL
, this output will be the estimated upper limit necessary to achieve the desired censoring rate across the \(L\) datasets.
References
Pazira H., Massa E., Weijers J.A.M., Coolen A.C.C. and Jonker M.A. (2024). Bayesian Federated Inference for Survival Models, arXiv. <https://arxiv.org/abs/2404.17464>
Author
Hassan Pazira
Maintainer: Hassan Pazira hassan.pazira@radboudumc.nl
Examples
# Setting a seed for reproducibility
set.seed(1123)
#-------------------------
# Simulating Survival data
#-------------------------
N <- c(7, 10, 13) # the sample sizes of 3 datasets
beta <- 1:4
p <- length(beta)
L <- 3
# Define a function to generate multivariate normal samples
mvrnorm_new <- function(n, mu, Sigma) {
pp <- length(mu)
e <- matrix(rnorm(n * pp), nrow = n)
return(crossprod(t(e), chol(Sigma)) + matrix(mu, n, pp, byrow = TRUE))
}
Z <- list()
for (z in seq_len(L)) {
Z[[z]] <- mvrnorm_new(n = N[z], mu = rep(0, p),
Sigma = diag(rep(1, p),p))
colnames(Z[[z]]) <- paste0("Z_",seq_len(ncol(Z[[z]])))
}
# One simulated dataset from exponential distribution with no censoring:
surv_data <- surv.simulate(Z = Z[[1]], beta = beta, a = exp(-.9),
cen_rate = 0, gen_data_from = "exp")
surv_data
#> $D
#> $D[[1]]
#> time status Z.Z_1 Z.Z_2 Z.Z_3 Z.Z_4
#> 1 0.075564227 1 0.70473619 0.94551438 -0.6930954 0.6174432
#> 2 0.732609577 1 1.10710882 0.37164284 -0.1714535 -0.2798152
#> 3 15.148291300 1 0.94005308 -1.14369244 -1.1564422 0.9669025
#> 4 0.003073544 1 -0.91096743 0.31234925 1.9137865 0.3402063
#> 5 2.022124176 1 1.38294723 -0.17384622 -1.2173467 0.3753887
#> 6 0.055636648 1 0.19442801 0.05596198 -0.8097336 1.6147903
#> 7 0.328196224 1 -0.03972739 0.79823340 -0.3073386 0.6104483
#>
#>
#> $censor_propor
#> [1] 0
#>
#> $u1
#> NULL
#>
#> $u2
#> NULL
#>
surv_data$D[[1]][,1:2] # The simulated survival data
#> time status
#> 1 0.075564227 1
#> 2 0.732609577 1
#> 3 15.148291300 1
#> 4 0.003073544 1
#> 5 2.022124176 1
#> 6 0.055636648 1
#> 7 0.328196224 1
# Calculate only 'u2' with a predefined censoring rate of 0.4:
u2_new <- surv.simulate(Z = Z[1:2], beta = beta, a = exp(-.9),
b = exp(1.8), u1 = 0.1, only_u2 = TRUE,
cen_rate = 0.4, gen_data_from = "weibul")$u2
u2_new
#> [1] 3.252022
# Two simulated datasets with a known 'u2':
# Using 'u2_new' to help control over censoring rate (was chosen 0.4)
surv.simulate(Z = Z[1:2], beta = beta, a = exp(-.9), b = exp(1.8),
u1 = 0.05, u2 = u2_new, gen_data_from = "weibul")
#> $D
#> $D[[1]]
#> time status Z.Z_1 Z.Z_2 Z.Z_3 Z.Z_4
#> 1 0.7399626 1 0.70473619 0.94551438 -0.6930954 0.6174432
#> 2 1.1207633 0 1.10710882 0.37164284 -0.1714535 -0.2798152
#> 3 1.7127411 1 0.94005308 -1.14369244 -1.1564422 0.9669025
#> 4 0.3660419 1 -0.91096743 0.31234925 1.9137865 0.3402063
#> 5 0.8653180 1 1.38294723 -0.17384622 -1.2173467 0.3753887
#> 6 0.3775353 1 0.19442801 0.05596198 -0.8097336 1.6147903
#> 7 0.7019206 1 -0.03972739 0.79823340 -0.3073386 0.6104483
#>
#> $D[[2]]
#> time status Z.Z_1 Z.Z_2 Z.Z_3 Z.Z_4
#> 1 3.1926945 0 -0.23027404 -1.2690632 -0.2673166 -1.55025861
#> 2 0.3414662 1 -0.32916907 1.6265643 0.2278069 0.91480231
#> 3 1.1314558 1 0.49944094 -0.7696502 0.3879469 -0.02710971
#> 4 1.1982244 1 -0.92867906 0.2380289 -1.2894161 0.70696930
#> 5 1.4326297 1 -0.96832355 0.2955621 -0.4037798 0.42578390
#> 6 0.9277940 1 -0.05840795 0.9128333 -0.4899637 0.34036840
#> 7 0.3113212 0 1.07646211 -0.8527642 0.2666801 1.41542485
#> 8 0.1139205 0 1.68615000 -0.5295145 1.3293412 0.59569653
#> 9 0.6654312 1 1.93659046 1.7751477 -1.2878720 0.26822334
#> 10 1.0432595 0 0.51202240 -1.6117240 0.9368176 -0.75419444
#>
#>
#> $censor_propor
#> [1] 0.1428571 0.4000000
#>
#> $u1
#> [1] 0.05
#>
#> $u2
#> [1] 3.252022
#>
# Three simulated datasets from 'weibul' with an unknown 'u2':
surv.simulate(Z = Z, beta = beta, a = exp(-1), b = exp(1),
u1 = 0.01, cen_rate = 0.3, gen_data_from = "weibul")
#> $D
#> $D[[1]]
#> time status Z.Z_1 Z.Z_2 Z.Z_3 Z.Z_4
#> 1 0.2823065 1 0.70473619 0.94551438 -0.6930954 0.6174432
#> 2 0.9975437 1 1.10710882 0.37164284 -0.1714535 -0.2798152
#> 3 0.8144260 0 0.94005308 -1.14369244 -1.1564422 0.9669025
#> 4 0.1386992 1 -0.91096743 0.31234925 1.9137865 0.3402063
#> 5 1.6849989 1 1.38294723 -0.17384622 -1.2173467 0.3753887
#> 6 0.1502737 0 0.19442801 0.05596198 -0.8097336 1.6147903
#> 7 0.3864806 1 -0.03972739 0.79823340 -0.3073386 0.6104483
#>
#> $D[[2]]
#> time status Z.Z_1 Z.Z_2 Z.Z_3 Z.Z_4
#> 1 7.75069014 0 -0.23027404 -1.2690632 -0.2673166 -1.55025861
#> 2 0.07093456 1 -0.32916907 1.6265643 0.2278069 0.91480231
#> 3 1.34441157 1 0.49944094 -0.7696502 0.3879469 -0.02710971
#> 4 1.85809906 1 -0.92867906 0.2380289 -1.2894161 0.70696930
#> 5 0.88334276 1 -0.96832355 0.2955621 -0.4037798 0.42578390
#> 6 0.72544536 0 -0.05840795 0.9128333 -0.4899637 0.34036840
#> 7 0.10815548 1 1.07646211 -0.8527642 0.2666801 1.41542485
#> 8 0.10718719 1 1.68615000 -0.5295145 1.3293412 0.59569653
#> 9 0.33155256 1 1.93659046 1.7751477 -1.2878720 0.26822334
#> 10 1.31690683 1 0.51202240 -1.6117240 0.9368176 -0.75419444
#>
#> $D[[3]]
#> time status Z.Z_1 Z.Z_2 Z.Z_3 Z.Z_4
#> 1 1.54786085 1 -1.44153951 0.84543041 1.63435118 -1.3325912
#> 2 0.28011636 1 0.42547690 0.91067540 -0.02140968 0.3038766
#> 3 0.07408815 1 0.63158367 0.08532232 0.64390401 1.2853559
#> 4 0.98235600 0 -0.15201586 -0.91096893 0.32015202 -0.1719336
#> 5 0.09308440 0 -1.48957048 0.09515951 1.29546216 -0.1569018
#> 6 0.20262018 1 0.50347542 0.65439581 -0.57036674 1.1085850
#> 7 0.78550533 0 0.95250295 -0.71764421 0.51880688 -1.5608051
#> 8 0.04926487 1 0.04937487 1.07616067 1.37303835 0.3469306
#> 9 0.38080909 1 0.37153942 -0.04582918 -0.45533515 1.2241016
#> 10 3.21427845 0 -0.90718225 -2.49706485 1.40629078 -0.8608085
#> 11 0.11008080 1 2.53099268 -0.03519585 0.12720354 1.4766462
#> 12 0.15020603 1 1.13004355 1.34211808 0.83315557 -0.4462548
#> 13 6.77948385 1 -0.73382185 0.15624687 -1.35326133 0.2320178
#>
#>
#> $censor_propor
#> [1] 0.2857143 0.2000000 0.3076923
#>
#> $u1
#> [1] 0.01
#>
#> $u2
#> [1] 9.967712
#>
# Two simulated datasets from 'gomp' with unknown 'u2' and censoring rate of 0.3:
surv.simulate(Z = Z[2:3], beta = beta, a = exp(1), b = exp(2), u1 = 0.1,
cen_rate = 0.3, gen_data_from = "gomp", Trace = TRUE)
#> censoring proportion for silos are: 0.01 0.01 with u2 = 20
#> censoring proportion for silos are: 0.01 0.01 with u2 = 14.19
#> censoring proportion for silos are: 0.02 0.02 with u2 = 10.10383
#> censoring proportion for silos are: 0.03 0.02 with u2 = 7.28991
#> censoring proportion for silos are: 0.04 0.03 with u2 = 5.275091
#> censoring proportion for silos are: 0.05 0.04 with u2 = 3.895249
#> censoring proportion for silos are: 0.07 0.07 with u2 = 2.896867
#> censoring proportion for silos are: 0.09 0.06 with u2 = 2.225017
#> censoring proportion for silos are: 0.12 0.09 with u2 = 1.72242
#> censoring proportion for silos are: 0.15 0.12 with u2 = 1.39092
#> censoring proportion for silos are: 0.17 0.15 with u2 = 1.162114
#> censoring proportion for silos are: 0.17 0.18 with u2 = 0.9992837
#> censoring proportion for silos are: 0.18 0.2 with u2 = 0.8723362
#> censoring proportion for silos are: 0.2 0.21 with u2 = 0.7788285
#> censoring proportion for silos are: 0.21 0.24 with u2 = 0.7064274
#> censoring proportion for silos are: 0.22 0.24 with u2 = 0.6536355
#> censoring proportion for silos are: 0.23 0.25 with u2 = 0.6087609
#> censoring proportion for silos are: 0.24 0.25 with u2 = 0.5731016
#> censoring proportion for silos are: 0.26 0.26 with u2 = 0.5429697
#> censoring proportion for silos are: 0.26 0.27 with u2 = 0.5212091
#> censoring proportion for silos are: 0.26 0.28 with u2 = 0.5016437
#> censoring proportion for silos are: 0.26 0.27 with u2 = 0.4858998
#> censoring proportion for silos are: 0.27 0.28 with u2 = 0.4684635
#> censoring proportion for silos are: 0.29 0.29 with u2 = 0.4560853
#> censoring proportion for silos are: 0.28 0.28 with u2 = 0.4499632
#> censoring proportion for silos are: 0.27 0.28 with u2 = 0.4408082
#> censoring proportion for silos are: 0.28 0.3 with u2 = 0.4296693
#>
#> Desired u2 is: 0.4296693 ( with u1 = 0.1 )
#>
#> $D
#> $D[[1]]
#> time status Z.Z_1 Z.Z_2 Z.Z_3 Z.Z_4
#> 1 3.491216e-01 0 -0.23027404 -1.2690632 -0.2673166 -1.55025861
#> 2 8.252691e-04 1 -0.32916907 1.6265643 0.2278069 0.91480231
#> 3 8.251961e-06 1 0.49944094 -0.7696502 0.3879469 -0.02710971
#> 4 3.457653e-01 1 -0.92867906 0.2380289 -1.2894161 0.70696930
#> 5 1.383742e-01 1 -0.96832355 0.2955621 -0.4037798 0.42578390
#> 6 3.500657e-02 1 -0.05840795 0.9128333 -0.4899637 0.34036840
#> 7 3.473370e-04 1 1.07646211 -0.8527642 0.2666801 1.41542485
#> 8 3.788401e-04 1 1.68615000 -0.5295145 1.3293412 0.59569653
#> 9 1.746997e-02 1 1.93659046 1.7751477 -1.2878720 0.26822334
#> 10 1.887190e-01 0 0.51202240 -1.6117240 0.9368176 -0.75419444
#>
#> $D[[2]]
#> time status Z.Z_1 Z.Z_2 Z.Z_3 Z.Z_4
#> 1 1.908368e-01 1 -1.44153951 0.84543041 1.63435118 -1.3325912
#> 2 5.553927e-03 1 0.42547690 0.91067540 -0.02140968 0.3038766
#> 3 2.554467e-05 1 0.63158367 0.08532232 0.64390401 1.2853559
#> 4 2.509140e-01 1 -0.15201586 -0.91096893 0.32015202 -0.1719336
#> 5 7.807589e-02 1 -1.48957048 0.09515951 1.29546216 -0.1569018
#> 6 8.807375e-03 1 0.50347542 0.65439581 -0.57036674 1.1085850
#> 7 2.237849e-01 0 0.95250295 -0.71764421 0.51880688 -1.5608051
#> 8 2.597067e-04 1 0.04937487 1.07616067 1.37303835 0.3469306
#> 9 7.799071e-03 1 0.37153942 -0.04582918 -0.45533515 1.2241016
#> 10 2.913660e-01 0 -0.90718225 -2.49706485 1.40629078 -0.8608085
#> 11 2.111413e-05 1 2.53099268 -0.03519585 0.12720354 1.4766462
#> 12 5.090627e-03 1 1.13004355 1.34211808 0.83315557 -0.4462548
#> 13 4.275830e-01 0 -0.73382185 0.15624687 -1.35326133 0.2320178
#>
#>
#> $censor_propor
#> [1] 0.2000000 0.2307692
#>
#> $u1
#> [1] 0.1
#>
#> $u2
#> [1] 0.4296693
#>