Definition 2.1.1
A population is a collection of all elements that possess a characteristic of interest.
Populations can be finite or infinite. A population where all the elements are easily countable may be considered as finite, and a population where all the elements are not easily countable as infinite. For example, a production batch of ball bearings may be considered a finite population, whereas all the ball bearings that may be produced from a certain manufacturing line are considered conceptually as being infinite.
Definition 2.1.2
A portion of a population selected for study is called a sample.
Definition 2.1.3
The target population is the population about which we want to make inferences based on the information contained in a sample.
Definition 2.1.4
The population from which a sample is being selected is called a sampled population.
The population from which a sample is being selected is called a sampled population, and the population being studied is called the target population. Usually, these two populations coincide, since every effort should be made to ensure that the sampled population is the same as the target population. However, whether for financial reasons, a time constraint, a part of the population not being easily accessible, the unexpected loss of a part of the population, and so forth, we may have situations where the sampled population is not equivalent to the whole target population. In such cases, conclusions made about the sampled population are not usually applicable to the target population.
In almost all statistical studies, the conclusions about a population are based on the information drawn from a sample. In order to obtain useful information about a population by studying a sample, it is important that the sample be a representative sample; that is, the sample should possess the characteristics of the population under investigation. For example, if we are interested in studying the family incomes in the United States, then our sample must consist of representative families that are very poor, poor, middle class, rich, and very rich. One way to achieve this goal is by taking a random sample.
Definition 2.1.5
A sample is called a simple random sample if each element of the population has the same chance of being included in the sample.
There are several techniques of selecting a random sample, but the concept that each element of the population has the same chance of being included in a sample forms the basis of all random sampling, namely simple random sampling, systematic random sampling, stratified random sampling, and cluster random sampling. These four different types of sampling schemes are usually referred to as sample designs.
Since collecting each data point costs time and money, it is important that in taking a sample, some balance be kept between the sample size and resources available. Too small a sample may not provide much useful information, but too large a sample may result in a waste of resources. Thus, it is very important that in any sampling procedure, an appropriate sampling design is selected. In this section, we will review, very briefly, the four sample designs mentioned previously.
Before taking any sample, we need to divide the target population into nonoverlapping units, usually known as sampling units. It is important to recognize that the sampling units in a given population may not always be the same. Sampling units are in fact determined by the sample design chosen. For example, in sampling voters in a metropolitan area, the sampling units might be individual voters, all voters in a family, all voters living in a town block, or all voters in a town. Similarly, in sampling parts from a manufacturing plant, the sampling units might be an individual part or a box containing several parts.
Definition 2.1.6
A list of all sampling units is called the sampling frame.
The most commonly used sample design is the simple random sampling design, which consists of selecting
Example 2.1.1 (Simple random sampling) Suppose that an engineer wants to take a sample of machine parts manufactured during a shift at a given plant. Since the parts from which the engineer wants to take the sample are manufactured during the same shift at the same plant, it is quite safe to assume that all parts are representative. Hence in this case, a simple random sampling design should be appropriate.
The second sampling design is the stratified random sampling design, which may give improved results for the same amount of money spent for simple random sampling. However, a stratified random sampling design is appropriate when a population can be divided into various nonoverlapping groups called strata. The sampling units in each stratum are similar but differ from stratum to stratum. Each stratum is treated as a subpopulation, and a simple random sample is taken from each of these subpopulations or strata.
In the manufacturing world, this type of sampling situation arises quite often. For instance, in Example 2.1.1, if the sample is taken from a population of parts manufactured either in different plants or in different shifts, then stratified random sampling can be more appropriate than simple random sampling. In addition, there is the advantage of administrative convenience. For example, if the machine parts are manufactured in plants located in different parts of the country, then stratified random sampling can be beneficial. Often, each plant (stratum) has a sampling department that can conduct the random sampling within each plant. In order to obtain best results in this case, the sampling departments in all the plants need to communicate with one another