Administrative Records for Survey Methodology. Группа авторов. Читать онлайн. Newlib. NEWLIB.NET

Автор: Группа авторов
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Математика
Год издания: 0
isbn: 9781119272069
Скачать книгу
using confidential data and protected data, respectively. Table 2.1 shows the distribution of the errors Δr across SIC-division × county cells, for accessions A, beginning-of-quarter employment B, full-quarter employment F, net job flows JF, and separations S (for additional tables, see Abowd et al. 2012). Table 2.1 shows that the time series properties of the QWI remain largely unaffected by the distortion. The central tendency of the bias (as measured by the median of the Δr distribution) is never greater than 0.001, and the error distribution is tight: the semi-interquartile range of the distortion for B in Table 2.1 is 0.022, which is less than the precision with which estimated serial correlation coefficients are normally displayed.10 The overall spread of the distribution is slightly higher when considering two-digit SIC × county and three-digit SIC × county cells (not reported here), due to the greater sparsity. The time series properties of the QWI data are unbiased. The small amount additional noise in the time series statistics is, in general, economically meaningless.

      Cross-sectional Unbiasedness of the Distorted Data

      The distribution of the infused noise is symmetric, and allocation of the noise factors is random. The data distribution resulting from the noise infusion should thus be unbiased. We compute the bias ΔX in each cell kt, expressed in percentage terms:

Variable Median Semi-interquartile range
Accessions −0.000 542 0.026 314
Beginning-of-quarter employment 0.000 230 0.021 775
Full-quarter employment 0.000 279 0.018 830
Net job flows −0.000 025 0.002 288
Separations 0.000 797 0.025 539
equation

      Box 2.2 Sidebox: Do-It-Yourself Noise Infusion

      The interested user might consult a simple example (with fake data) at https://github.com/labordynamicsinstitute/rampnoise (Vilhuber 2017) that illustrates this mechanism.

      The provision of very detailed micro-tabulations or public-use microdata may not be sufficient to inform certain types of research questions. In particular, for business data the thresholds that trigger SDL suppression methods are met far more often than for individuals or households. In those cases, the research community needs controlled access to confidential microdata. Three key reasons why access to microdata may be beneficial are:

      1 (i) microdata permit policy makers to pose and analyze complex questions. In economics, for example, analysis of aggregate statistics does not give a sufficiently accurate view of the functioning of the economy to allow analysis of the components of productivity growth;

      2 (ii) access to microdata permits analysts to calculate marginal rather than just average effects. For example, microdata enable analysts to do multivariate regressions whereby the marginal impact of specific variables can be isolated;

      3 (iii) broadly speaking, widely available access to microdata enables replication of important research(United Nations 2007, p. 4)

      As we’ve outlined above, many of the concerns about confidentiality have either removed or prevented creation of public-use microdata versions of linked files, exacerbating the necessity of providing alternate access to the confidential microdata.

      2.4.1 Statistical Data Enclaves

      In the United States, a 2004 grant by the National Science Foundation laid the groundwork for subsequent expansion of the (then Census) Research Data Center network from 8 locations, open since the mid-1990s, to over 30 locations