Sampling and Estimation from Finite Populations. Yves Tille. Читать онлайн. Newlib. NEWLIB.NET

Автор: Yves Tille
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Математика
Год издания: 0
isbn: 9781119071273
Скачать книгу
superpopulation model in an estimation procedure is a breach of a principle of impartiality which is part of the ethics of statisticians. This argument comes directly from the current definition of official statistics. The principle of impartiality is part of this definition as the principle of accuracy was part of it in the 19th century. If modeling a population is easily conceived as a research tool or as a predictive tool, it remains fundamentally questionable in the field of official statistics.

      The “superpopulation” approach has led to extremely fruitful research. The development of a hybrid approach called the model‐assisted approach allows valid inferences to be provided under the model but is also robust when the model is wrong. This view was mainly developed by a Swedish school (see Särndal et al., 1992). The model allows us to take into account auxiliary information at the time of estimation while preserving properties of robustness for the estimators in the event of nonadequacy of the model. It is actually very difficult to construct an estimator that takes into account a set of auxiliary information after the selection of the sample without making a hypothesis, even a simple one, on the relation existing between the auxiliary information and the variable of interest. The modeling allows a conceptualization of this type of presumption. The model‐assisted approach allows us to construct interesting and practical estimators. It is now clear that the introduction of a model is a necessity for dealing with some nonresponse and estimation problems in small areas. In this type of problem, whatever the technique used, one always postulates the existence of a model even if sometimes this is implicit. The model also deserves to be clearly determined in order to explain the underlying ideas that justify the application of the method.

      The 1990s were marked by the emergence of the concept of auxiliary information. This relatively general notion includes all information external to the survey itself used to increase the accuracy of the results of a survey. This information can be the knowledge of the values of one or more variables on all the units of the population or simply a function of these values. For most surveys, auxiliary information is available. It can be given by a census or simply by the sampling frame. Examples of auxiliary information include the total of a variable on the population, subtotals according to subpopulations, averages, proportions, variances, and values of a variable on all the units of the sampling frame. Therefore, the notion of auxiliary information encompasses all data from censuses or administrative sources.

Block diagram of the flow of auxiliary information, which can be via sampling design and estimation. Data collection brings sampling design to estimation.

      The books of precursors are Yates (1946, 1949, 1960, 1979), Deming (1948, 1950, 1960), Thionet (1953), Sukhatme (1954), Hansen et al. (1953a,b), Cochran (1953, 1963, 1977), Dalenius (1957), Kish (1965, 1989, 1995), Murthy (1967), Raj (1968), Johnson & Smith (1969), Sukhatme & Sukhatme (1970), Konijn (1973), Lanke (1975), Cassel et al. (1977, 1993), Jessen (1978), Hájek (1981), and Kalton (1983). These books are worth consulting because many modern ideas, especially on calibration and balancing, are discussed in them.

      Important reference works include Skinner et al. (1989), Särndal et al. (1992), Lohr (1999, 2009b), Thompson (1997), Brewer (2002), Ardilly & Tillé (2006), and Fuller (2011). The series Handbook of Statistics, which is devoted to sampling, was published at 15‐year intervals. First, a volume headed by Krishnaiah & Rao (1994), then two volumes headed by Pfeffermann & Rao (2009a,b). There is also a recent collective work led by Wolf et al. (2016).

      The works of Thompson (1992, 1997, 2012) and Thompson & Seber (1996) are devoted to sampling in space. Methods for environmental sampling are developed in Gregoire & Valentine (2007) and for forestry in Mandallaz (2008). Several books are dedicated to unequal probability sampling and sampling algorithms. One can cite Brewer & Hanif (1983), Gabler (1990), and Tillé (2006). The model‐based approach is clearly described in Valliant et al. (2000), Chambers & Clark (2012), and Valliant et al. (2013).

      Many relevant books have been published and are still available in French. One can cite Thionet (1953), Desabie (1966), Deroo & Dussaix (1980), Gouriéroux (1981), Grosbras (1987), Dussaix & Grosbras (1992), Dussaix & Grosbras (1996), Ardilly (1994, 2006), Ardilly & Tillé (2003), and Ardilly & Lavallée (2017). In Italian, one can consult the works of Cicchitelli et al. (1992, 1997), Frosini et al. (2011), and Conti & Marella (2012). In Spanish, there exist also the books of Pérez López (2000), Tillé (2010), and Gutiérrez (2009) as well as a translation of the book of Sharon Lohr (2000). In German, one finds the books of Stenger (1985) and of Kauermann & Küchenhoff (2010). Finally, in Chinese there is a book by Ren & Ma (1996) and in Korean by Kim (2017).

      Recently, new research fields have been opened. Small area estimation from survey data has became a major research topic (Rao, 2003; Rao & Molina, 2015). Recent developments in survey methodology are described in Groves (2004b) and Groves et al. (2009). Indirect sampling involves the selection of samples from a population that is not the population of interest but has links to it (Lavallée, 2002, 2007), for example new sampling algorithms have been developed to select balanced samples (Tillé, 2006). Adaptive sampling consists of completing the initial sample based on preliminary results (Thompson, 1992; Thompson & Seber, 1996). Capture–recapture methods are used to estimate the size of animal populations. Variations of these methods sometimes allow rare population sizes to be estimated or coverage surveys to be carried out (Pollock, 2000; Seber, 2002).

      Resampling methods have been developed for finite populations (Shao & Tu, 1995; Groves, 2004b). Of course measurement errors will always remain a major research topic (Fuller, 1987; Groves, 2004a). Finally, substantial progress has been made in nonresponse methods: reweighting methods or imputation techniques (Särndal & Lundström, 2005; Bethlehem et al., 2011; De Waal et al., 2011; Kim & Shao, 2013).

      One of the challenges that is currently emerging is the integration of data from multiple sources: administrative files, registers, and samples. In a thoughtful article entitled Big data: are we making a big mistake?, Tim Harford (2014) reminds us that the abundance of data is never a guarantee of quality. Access to new sources of data should not make us fall back into the mistakes of the past, as was the case during the 1936 US presidential election (see Section 1.5, page 6).

      There have been methods for decades to integrate data from different sources. However, the multiplication of available sources makes these integration issues more and more complex. There is still a lot of research and development work