Administrative records in a general sense are records kept for administrative purposes of the government. Administrative records can pertain to almost all aspects of life, including taxes, wages, education, health, residence, voting, crime, and property and business ownership. Does an individual have a license for a dog, for fishing at public lakes, to drive a car or motorcycle, or to own a gun? Does an individual receive public assistance through a government program? Administrative records, essential for government operations, contain a wealth of information on large segments of the population, but there are limitations. The records contain information on only some variables on subsets of the overall population. Information is collected so that a government can execute its program, but not typically for other purposes. Additional variables that might be interesting for study purposes likely are not recorded. Methods of recording variables might not be those that would be used in a scientific study. Those included in an administrative data file are not a random sample from the population. Some administrative records are collected over the course of several months or years, instead of only during a succinct time interval.
The use of administrative records has been part of the survey process for many decades. Survey textbooks since at least the 1960s (Cochran 1977; Kish 1967; Hansen, Hurwitz, and Madow 1953; Särndal, Swensson, and Wretman 1992) present methods for using auxiliary variables. It typically is assumed that values of auxiliary variables are available for all members of the population without error, or at least that aggregate totals are known. They might have come from a census, from a large survey at a previous time, or as part of the sample frame. Auxiliary variables are used for stratified surveys, probability proportional to size sampling, difference estimation, and ratio estimation. Often, they are treated in classic literature as known, fixed values.
Despite the limitations of administrative records, researchers, including the authors in this book, have been exploring how “adrecs” can be used to improve sample surveys in today’s world and build on the record of past successes. They have examined new possibilities for using administrative record information to address four goals (coverage, response, variables, and accuracy) of official surveys. Increasing timeliness and decreasing costs through use of administrative records also are of continuing interest.
The book is organized into four sections. The first section contains two chapters. Chapter 1, by Li-Chun Zhang, presents fundamental challenges and approaches to integrating survey and administrative data for statistical purposes. The chapter focuses on administrative data, also called register or registry data, as a source for proxy variables. The proxy variables obtained from administrative sources can, for example, enhance a survey by providing additional information, be used for quality assessment of responses, and provide substitutes for missing values. Chapter 2, by John Marion Abowd, Ian Schmutte, and Lars Vilhuber addresses confidentiality protection and disclosure limitation in linked data. Linking data on population elements is an essential step for many uses of administrative records in conjunction with survey data. If individuals from a survey can be located uniquely in administrative records, then variables in those administrative records can be meaningfully associated with their originating units, thereby generating useful proxy variables. Data files from surveys, both from those linked to administrative information and those not, are made available to researchers and policy analysts. In standard practice, values of personally identifying information, such as names, fine-level geographic information including addresses, birthdates, and identification numbers, are suppressed. A data file containing a rich set of variables for analysis, however, increases the chance that someone could identify a unique individual from the survey in the population based on the values for several variables. The concern is that such an identification violates legal promises of confidentiality, causes harm to individuals who view their survey responses and administrative information as sensitive, and endangers future survey operations. Chapter 2 describes three applications, traditional statistical disclosure limitation methods, and new developments. The article includes discussion of how researchers access data (access modalities) and the usefulness (analytic validity) of data made available after modification for enhanced disclosure limitation.
Section 2 groups together five chapters on data quality and record linkage. Chapter 3, by Piet Daas, Eric Schulte Nordholt, Martjin Tennekes, and Saskia Ossen, examines the quality of administrative data used in the Dutch virtual census. A challenge in assessing quality of a data source is having better information on some variables for at least a subset of the population. Coen Hendriks, in Chapter 4, reports on improving the quality of data going into Norwegian register-based statistics. In Chapter 5, William Winkler considers a wide range of topics from initial cleaning of data files, record linkage, and integrated modeling, editing, and imputation. The impact of cleaning data files through standardizing variables, parsing variables such as addresses into separable components, and checking for logical errors cannot be overstated. Various approaches are in use for linking records from two files on the same population. Dr. Winkler reviews several enhancements, including variations in string comparator metrics and memory indexing, that have been put into practice at the U.S. Census Bureau. Jerry Reiter writes about assessing uncertainty when using administrative records in Chapter 6. Along with survey estimates, one typically needs to provide estimates of standard error. How do the quality of administrative records and the performance of the linkage to the survey impact the accuracy of estimates? Multiple imputation (Rubin 1986, 1987) could be one area for further exploration. In Chapter 7, Joseph Sakshaug addresses the specific question of measuring and controlling non-consent bias when surveys and administrative data are linked together. It is increasingly common for surveys that plan to link respondents to administrative data to ask for permission to do so. Some individuals refuse to give permission for linkage or cannot be linked due to other reasons, such as refusing to provide information on key linkage variables. Those whose records are not linkable can be different in many ways from those whose records are. Bias due to non-consent to linkage and failed linkage is therefore a novel contributing factor to total survey error.
Section 3 contains four articles on uses of administrative records in surveys and official statistics. Chapter 8 by Ingegerd Jansson, Martin Axelson, Anders Holmberg, Peter Werner, and Sara Westling describes experiences in the first Swedish register-based census of the population. In a register-based census, the population is counted and characteristics are gathered directly from administrative records, which, in this case, are referred to as population registers. Chapter 9 by Vincent Tom Mule and Andrew Keller of the U.S. Census Bureau presents research on administrative records applications for the U.S. 2020 Decennial Census of the population. In the U.S., there is no universal population register and the census involves enumerating and gathering basic information on every person in the country. Administrative records have been used to improve the data gathering process in the past. This chapter describes expanded options for improved design, quality and accuracy assessment, and dealing with missing information.