Administrative Records for Survey Methodology. Группа авторов. Читать онлайн. Newlib. NEWLIB.NET

Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Математика
Год издания: 0
isbn: 9781119272069
Скачать книгу
can be rule-processed. The Labor Force Survey (LFS) estimate of the yearly total of employed is then introduced to define an income threshold in the different subsets of part (III), whereby everyone above the threshold is reclassified as employed, such that the register total of employed coincides with the LFS estimate. As shown by Fosen and Zhang (2011), the resulting adjusted register proxy variable entails smaller mean squared error at the municipality level, compared to the survey estimates where the register proxy is used as an auxiliary variable.

      Within the context of combining register and survey data, we consider here multisource estimation methods that make use of two or more proxy variables. Deficiency of coverage, relevance, and timeliness is often the reason that register-based estimation is not viable. When the lack of coverage can be limited to specific domains or variables, the problem can be remedied by the collection of supplementary survey data using the split-population or split-data approach. There would be only one value for each variable of interest now that the data supplement each other. Different multisource estimation approaches are needed for multiple proxy variables.

Linked data One target measure and relevance bias in the others?
Yes (asymmetric) No (symmetric)
Yes (linked) Survey weighting Prediction modeling Capture–recapture methods Structural equation modeling
No (unlinked) Benchmark adjustment Constrained optimization

      1.3.1 Asymmetric Setting

      The two most common approaches under the asymmetric-linked setting are survey weighting and prediction modeling, where the register proxy variable is used as an auxiliary variable or a covariate. See e.g. Säarndal, Swensson, and Wretman (1992), for design-based approach to survey weighting that makes use of auxiliary variables; Valliant, Dorfman, and Royall (2000) and Chambers and Clark (2012) for model-based approach to finite population prediction; Rao and Molina (2015) for relevant methods of small area estimation. We make two observations. Firstly, when the overlapping survey variable is deemed necessary despite the presence of a register proxy, the latter is typically the most powerful among all the auxiliary variables when it comes to weighting adjustment and regression modeling. See e.g. Djerf (1997) and Thomsen and Zhang (2001) for the use of register economic activity status in the LFS, and the effects on reducing sampling and nonresponse errors. Secondly, applications to remedy Representation errors are much less common. However see, e.g. survey weighting under dependent sampling for the estimation of coverage errors (Nirel and Glickman 2009), mixed-effects models for assessing register coverage errors (Mancini and Toti 2014), and different misclassification models for register NACE (Van Delden et al. 2016), and register household (Zhang 2011).

      The nature of a proxy variable implies a special use that is beyond what is feasible with a non-proxy auxiliary variable, no matter how good an auxiliary it is: provided suitable conditions, it is possible to substitute (or replace) the target measure by the proxy value. However, substitution would only be acceptable for a subset of the units but not all since, had it been acceptable for all the units, one would have had “direct tabulation” instead.

      It follows that adjustment, or imputation in the case of a rejected value, will be necessary. Macro-level survey estimates can be imposed as benchmarks to achieve statistical relevance at the corresponding level. Linked datasets are typically not necessary here – recall the Norwegian register-based employment status described earlier. This yields many methods under what may be referred to as the benchmarked adjustment approach for combining register and survey proxy variables under the asymmetric-unlinked setting.

      Repeated weighting and constrained (mass) imputation are two common approaches of benchmarked adjustment; see e.g. de Waal (2016) for a discussion. Repeated weighting is a technique initially presented for sample reweighing in the presence of overlapping survey estimates (Renssen and Nieuwenbroek 1997). It has been used for the reconciliation of Dutch virtual census output tables (Houbiers 2004). But it can equally be applied to adjust register datasets so that afterward, e.g. the weighted register proxy total agrees with the valid target totals imposed. This does not require linking the register datasets and the external datasets from which the benchmark totals are obtained. An inconvenience arises in cases where there are multiple proxy variables to be benchmarked and the variables are available for different subsets of units. This may be the case due to partial missing data in a single register file or when merging multiple register files. Some imputation will then be necessary if one would like to have a single set of weights for the whole dataset.

      The one-number census imputation provides an example of the alternative imputation-based benchmarked adjustment methods (Brown et al. 1999). In the case of multiple proxy variables observed on different subsets of units, imputation is applied not only to the units with partially missing data, but also to the units with no observed variables at all, or possibly the units with completely observed data. The result is a complete dataset that guarantees numerical consistency for any tabulation across the variables and population domains. Constrained imputation for population datasets are e.g. discussed by Shlomo, de Waal, and Pannekoek (2009) and Zhang (2009a). Methods that incorporate micro-data edit constraints are e.g. studied in Coutinho, de Waal, and Shlomo (2013), Pannekoek, Shlomo, and DeWaal (2013), and Pannenkoek and Zhang (2015). Chambers and Ren (2004) consider a method of benchmarked outlier robust imputation. Obviously, it may be difficult to generate a single population dataset that is fit for all possible statistical uses. de Waal (2016) discusses the use of “repeated imputation.” Notice that there are many relevant works on the generation of benchmarked synthetic populations in Spatial Demography, Econometrics, and Sociology.

denote the survey-based estimates of population totals by income class, which are the row and column benchmarks of the target table Y, respectively. Starting with X and by means of iterative proportional fitting (IPF) until convergence, one