Note that the number of subjects in a trial usually increases as the phases of the study progress. That is, a phase I trial usually involves fewer subjects than a phase II trial, a phase II trial usually involves fewer subjects than a phase III trial, and a phase III trial usually involves fewer subjects than a phase IV trial. Also, some research studies involving human subjects will have less than four phases. For example, it is not unusual for screening, prevention, diagnostic, genetic, and quality-of-life studies to be conducted in only phase I or II trials. However, new drugs and biomedical procedures almost always require phase I, II, and III clinical trials for approval and a phase IV trial to track the safety of the drug after its approval. The development of a new drug may take many years to proceed through the first three phases of the approval process, and following approval, the phase IV trial usually extends over a period of many years.
1.4 Data Set Descriptions
Throughout this book several data sets will be used in the examples and exercises. These data sets are available at www.mtech.edu/math/faculty/rick-rossi/Book-Rossi.html as Excel files and MINITAB worksheets; web addresses frequently change at Montana Tech, searching for Rick Rossi on the Montana Tech web page will also allow you to find the data sets on my personal web page.
Permission to use the Birth Weight, Intensive Care Unit, Coronary Heart Disease, UMASS Aids Research Unit, and Prostate Cancer data sets has been granted by John Wiley & Sons, Inc. These data sets were first published in Applied Logistic Regression (Hosmer, 2000). Permission to use the Body Fat data set has been provided by Roger W. Johnson, Department of Mathematics & Computer Science, South Dakota School of Mines & Technology and the Journal of Statistics Education.
1.4.1 Birth Weight Data Set
The Birth Weight data set consists of data collected on 189 women to identify the risk factors associated with the birth of a low birth weight baby. The data set was collected at the Baystate Medical Center in Springfield, Massachusetts. The variables included in this data set are summarized in Table 1.1.
Table 1.1 A Description of the Variables in the Birth Weight Data Set
Variable | Description | Codes/Values | Name |
---|---|---|---|
1 | Identification code | ID number | ID |
2 | Low birth weight | 1 = BWT≤2500 g | LOW |
0 = BWT>2500 g | |||
3 | Age of mother | Years | AGE |
4 | Weight of mother at | Pounds | LWT |
last menstrual period | |||
5 | Race | 1 = White | RACE |
2 = Black | |||
3 = Other | |||
6 | Smoking status during pregnancy | 0 = No | SMOKE |
1 = Yes | |||
7 | History of premature labor | 0 = None | PTL |
1 = One | |||
2 = Two, etc. | |||
8 | History of hypertension | 0 = No | HT |
1 = Yes | |||
9 | Presence of uterine irritability | 0 = No | UI |
1 = Yes | |||
10 | Number of physician visits | 0 = None | FTV |
during the first trimester | 1 = One | ||
2 = Two, etc. | |||
11 | Birth weight | Grams | BWT |
1.4.2 Body Fat Data Set
The Body Fat data set consists of data collected on 252 adult males. The data were originally collected to build a model relating body density and percentage of body fat in adult males to several body measurement variables. These data were originally used in the article “Generalized body composition prediction equation for men using simple measurement techniques,” published in Medicine and Science in Sports and Exercise (Penrose et al., 1985). The variables included in this data set are summarized in Table 1.2. Two data sets have also been created from the Body Fat data set. These data sets have the same variables as the Body Fat data set and were formed by randomly sampling the Body Fat data set to create a training set of 189 observations called bodyfat-tr.xlsx and a validation set or 63 observations called bodyfat-val.xlsx
. These data sets are used in the model validation sections of the text.