Sampling and Estimation from Finite Populations. Yves Tille. Читать онлайн. Newlib. NEWLIB.NET

Автор: Yves Tille
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Математика
Год издания: 0
isbn: 9781119071273
Скачать книгу
event, which favored the increase in the practice of sample surveys, was made without reference to the probabilistic theory that had already been developed. The method of Crossley, Roper, and Gallup is indeed not probabilistic but empirical, therefore validation of the adequacy of the method is experimental and absolutely not mathematical.

      The theory of survey sampling, which makes abundant use of the calculation of probabilities, attracted the attention of university statisticians and very quickly they reviewed all aspects of this theory that have a mathematical interest. A coherent mathematical theory of survey sampling was constructed. The statisticians very quickly came up against a difficult problem: surveys with finite populations. The proposed model postulated the identifiability of the units. This component of the model makes irrelevant the application of the reduction by sufficiency and the maximum likelihood method. Godambe (1955) states that there is no optimal linear estimator. This result is one of the many pieces of evidence showing the impossibility of defining optimal estimation procedures for general sampling designs in finite populations. Next, Basu (1969) and Basu & Ghosh (1967) demonstrated that the reduction by sufficiency is limited to the suppression of the information concerning the multiplicity of the units and therefore of the nonoperationality of this method. Several approaches were examined, including one from the theory of the decision. New properties, such as hyperadmissibility (see Hanurav, 1968), are defined for estimators applicable in finite populations.

      A purely theoretical school of survey sampling developed rapidly. This theory attracted the attention of researchers specializing in mathematical statistics, such as Debabrata Basu, who was interested in the specifics of the theory of survey sampling. However, many of the proposed results were theorems of the nonexistence of optimal solutions. Research on the question of the foundations of inference in survey theory was becoming so important that it was the subject of a symposium in Waterloo, Canada, in 1971. At this symposium, the intervention of Calyampudi Radhakrishna Rao (1971, p. 178), began with a very pessimistic statement:

      This introduction announced the direction of current research.

      In survey sampling theory, there is no theorem showing the optimality of an estimation procedure for general sampling designs. Optimal estimation methods can only be found by restricting them to particular classes of procedures. Even if one limits oneself to a particular class of estimators (such as the class of linear or unbiased estimators), it is not possible to obtain interesting results. One possible way out of this impasse is to change the formalization of the problem, for example by assuming that the population itself is random.

      The absence of tangible general results concerning certain classes of estimators led to the development of population modeling by means of a model called “superpopulation”. In this model‐based approach, it is assumed that the values taken by the variable of interest on the observation units of the population are the realizations of random variables. The superpopulation model defines a class of distributions to which these random variables are supposed to belong. The sample is then derived from a double random experiment: a realization of the model that generates the population and then the choice of the sample. The idea of modeling the population was present in Brewer (1963a), but it was developed by Royall (1970b, 1971, 1976b) (see also Valliant et al., 2000; Chambers & Clark, 2012).

      Drawing on the fact that the random sample is an “ancillary” statistic, Royall proposed to work conditionally on it. In other words, he considered that once the sample is selected, the choice of units is no longer random. This new modeling allowed the development of a particular research school. The model must express a known and previously accepted relationship. According to Royall, if the superpopulation model “adequately” describes the population, the inference can be conducted only with respect to the model, conditional to the sample selection. The use of the model then allows us to determine an optimal estimator.

      One can object that a model is always an approximate representation of the population. However, the model is not built to be tested for data but to “assist” the estimation. If the model is correct, then Royall's method will provide a powerful estimator. If the model is false, the bias may be so important that the confidence intervals built for the parameter are not valid. This is essentially the critique stated by Hansen et al. (1983).

      That is not to say that the arguments for or against parametric inference in the usual statistical theory are not of interest in the context of the theory of survey sampling. In our assessment of these arguments, however, we must pay attention to the relevant specifics of the applications.

      According to Dalenius, it is therefore in the discipline in which the theory of survey sampling is applied that useful conclusions should be drawn concerning the adequacy of a superpopulation model.

      The statistical theory of surveys mainly applies in official statistics institutes. These institutes do not develop a science but have a mission from their states. There is a fairly standard argument by the heads of national