Nonparametric estimation of the lifetime and disease onset distributions for a survival-sacrifice model

Antonio Eduardo Gomes, Piet Groeneboom, Jon A. Wellner

Research output: Contribution to journalArticleScientificpeer-review

84 Downloads (Pure)


In carcinogenicity experiments with animals where the tumor is not palpable it is common to observe only the time of death of the animal, the cause of death (the tumor or another independent cause, as sacrifice) and whether the tumor was present at the time of death. These last two indicator variables are evaluated after an autopsy. Defining the non-negative variables T1 (time of tumor onset), T2 (time of death from the tumor) and C (time of death from an unrelated cause), we observe (Y,Δ1,Δ2), where Y = min{T2,C},Δ1 =1 {T1≤C}, and Δ2 =1 {T2≤C}. The random variables T1 and T2 are independent of C and have a joint distribution such that P(T1 ≤ T2) = 1. Some authors call this model a “survival-sacrifice model”. [20] (generally to be denoted by LJP (1997)) proposed a Weighted Least Squares estimator for F1 (the marginal distribution function of T1), using the Kaplan-Meier estimator of F2 (the marginal distribution function of T2). The authors claimed that their estimator is more efficient than the MLE (maximum likelihood estimator) of F1 and that the Kaplan-Meier estimator is more efficient than the MLE of F2. However, we show that the MLE of F1 was not computed correctly, and that the (claimed) MLE estimate of F1 is even undefined in the case of active constraints. In our simulation study we used a primal-dual interior point algorithm to obtain the true MLE of F1. The results showed a better performance of the MLE of F1 over the weighted least squares estimator in LJP (1997) for points where F1 is close to F2. Moreover, application to the model, used in the simulation study of LJP (1997), showed smaller variances of the MLE estimators of the first and second moments for both F1 and F2, and sample sizes from 100 up to 5000, in comparison to the estimates, based on the weighted least squares estimator for F1, proposed in LJP (1997), and the Kaplan-Meier estimator for F2. R scripts are provided for computing the estimates either with the primal-dual interior point method or by the EM algorithm. In spite of the long history of the model in the biometrics literature (since about 1982), basic properties of the real maximum likelihood estimator (MLE) were still unknown. We give necessary and sfficient conditions for the MLE (Theorem 3.1), as an element of a cone, where the number of generators of the cone increases quadratically with sample size. From this and a self-consistency equation, turned into a Volterra integral equation, we derive the consistency of the MLE (Theorem 4.1). We conjecture that (under some natural conditions) one can extend the methods, used to prove consistency, to proving that the MLE is √n consistent for F2 and cube root n convergent for F1, but this has presently not yet been proved.

Original languageEnglish
Pages (from-to)3195-3242
Number of pages48
JournalElectronic Journal of Statistics
Issue number2
Publication statusPublished - 2019


Dive into the research topics of 'Nonparametric estimation of the lifetime and disease onset distributions for a survival-sacrifice model'. Together they form a unique fingerprint.

Cite this