The Statistical Analysis of Doubly Truncated Data. Prof Carla Moreira. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Prof Carla Moreira
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Медицина
Год издания:	0
isbn:	9781119500476

Скачать книгу

alt="b Subscript upper V"/> denote respectively the lower and upper endpoints of the supports of

and

(see Chapter 2 for details). This may have important practical consequences, as we will see. On the other hand, in applications with doubly truncated survival data the estimates correspond to the susceptible population for which the terminal event of interest is sure. This is in contrast to the standard analysis of survival times where a portion of the individuals may belong to the so‐called cured fraction, or immunes. This should be taken into account when interpreting the results from the analysis.

An important difference of double truncation when compared to one‐sided truncation is that, with doubly truncated data, the NPMLE of the probability distribution has no explicit form. In fact, the NPMLE may be non‐unique and even non‐existing (Xiao and Hudgens, 2019); see Chapter 2. Several iterative algorithms that have been proposed to compute the NPMLE in practice (Efron and Petrosian, 1999; Shen, 2010) will be reviewed in this book, and simulated and real data examples will be analysed with existing libraries of the software R. Semiparametric and parametric alternatives to the NPMLE will be introduced too; these approaches avoid some of the aforementioned potential issues of non‐uniqueness or non‐existence of the NPMLE, also reducing the variance at the price of introducing some bias in estimation. Also, resampling procedures, testing problems, smoothing methods, regression models and multi‐state data analysis under double truncation will be presented.

1.4 Real Data Examples

In this section we introduce the datasets that will be used throughout the book for illustration purposes. All of them suffer from double truncation. These examples are available within the last update of the DTDA package (Moreira et al., 2021a).

1.4.1 Childhood Cancer Data

The Childhood Cancer Data were gathered from the IPO (Instituto Português de Oncologia) of Porto, Portugal, by the RORENO (Registro Oncológico do Norte) service. The information corresponds to all children diagnosed from cancer between 1 January 1999 (

) and 31 December 2003 (

) in the region of North Portugal, which includes five districts: Porto, Braga, Bragança, Vila Real and Viana do Castelo. The variable of main interest

is the age at diagnosis which, by definition of childhood cancer, is supported on the

interval (time in years). The number of cases was 409. However, for three cases the value of

was not available, so we only consider the

children who report complete information.

Because of the interval sampling, the age at diagnosis

is doubly truncated by the pair

, where the right‐truncation variable

is the time in years from birth (date of onset,

) to 31 December 2003, and

. The

triplets

, with the values observed for

were reported in Moreira and de Uña‐Álvarez (2010), while de Uña‐Álvarez (2020) included the cancer group in the statistical analysis. Ordinary descriptive statistics can be applied to the information gathered along this 5 year long window to compute, for instance, the average age at cancer diagnosis. However, if the goal is to describe the population of children eventually developing cancer, the double truncation issue should be acknowledged and properly corrected, so potential biases are avoided.

Interestingly, the observed values for

range between

and 14.5 (years); equivalently, the observed values for

range between 0.5 and 19.5. This means that the lower and upper endpoints of

and

satisfy

and

. Thus, in this case, the target variable

is observable on its whole support

, and there are no identification issues for

, the cdf of

. Information on

is summarized in Table 1.1.

Table 1.1 Descriptive statistics for Childhood Cancer Data: sample size

and mean (and standard deviation, SD) for the age at diagnosis (years).

Group			Mean (SD)
All		406	6.47 (4.50)
By gender	Female	178	6.43 (4.51)
	Male	228	6.51 (4.51)
Скачать книгу В начало < 3 4 5 6 7 8 9 10 11 12 > В конец e-mail: [email protected]