QWI – Quarterly Workforce Indicators, a set of local statistics of employment and earnings, produced by the Census Bureau’s LEHD program (https://lehd.ces.census.gov/data/)
SIPP – Survey of Income and Program Participation is conducted by the U.S. Census Bureau on topics such as economic well-being, health insurance, and food security (https://www.census.gov/sipp/).
SSB – the SIPP Synthetic Beta File, also known as “SIPP/SSA/IRS Public Use File”
2.A.1 Other Abbreviations
ABS – Australian Bureau of Statistics, the Australian NSO (http://abs.gov.au/)
AEA – American Economic Association (https://www.aeaweb.org)
ASA – American Statistical Association (https://www.amstat.org)
BLS – Bureau of Labor Statistics, the NSO in the United States providing data on “labor market activity, working conditions, and price changes in the economy.” (https://bls.gov)
CASD – Centre d’accès sécurisé distant aux données, the French remote access system to most administrative data files (https://casd.eu)
Census Bureau – the largest statistical agency in the United States (https://census.gov)
CMS – Center for Medicare and Medicaid Services administers US government health programs such as Medicare, Medicaid, and others (https://cms.gov/)
EIA – Energy Information Agency, collecting and disseminating information on energy generation and consumption in the United States (https://eia.gov).
FICA – Federal Insurance Contribution Act, the law regulating the system of social security benefits in the United States
IAB – Institute for Employment Research at the German Ministry of Labor (http://iab.de/en/iab-aktuell.aspx)
FSRDC – Federal Statistical Research Data Centers were originally created as the U.S. Census Bureau Research Data Centers. They provide secure facilities for authorized remote access government restricted-use microdata, and are structured as partnerships between federal statistical agencies and research institutions (https://www.census.gov/fsrdc)
IRS – Internal Revenue Service handles tax collection for the US government (https://irs.gov)
NCHS – National Center for Health Statistics, the US NSO charged with collecting and disseminating information on health and well-being (https://www.cdc.gov/nchs/)
NSO – National statistical offices. Most countries have a single national statistical agency, but some countries (USA, Germany) have multiple statistical agencies
OASDI – Old Age, Survivors and Disability Insurance program, the official name for Social Security in the United States
QCEW – Quarterly Census of Employment and Wages is a program run by the BLS, collecting firm-level reports of employment and wages, and publishing quarterly estimates for about 95% of US jobs (https://www.bls.gov/cew/)
SER – Summary Earnings Records on SSA data
SSA – Social Security Administration, administers government-provided retirement, disability, and survivors benefits in the United States (https://ssa.gov)
SSN – Social Security Number, an identification number in the United States, originally used for management of benefits administered by the SSA, but since expanded and serving as a quasi-national identifier number
UI – Unemployment Insurance, which in the United States are administered by each of the states (and District of Columbia)
U.S.C – United States Code is the official compilation of laws and regulations in the United States
2.A.2 Concepts
Analytical validity: It exists when, at a minimum, estimands can be estimated without bias and their confidence intervals (or the nominal level of significance for hypothesis tests) can be stated accurately (Rubin 1987). The estimands can be summaries of the univariate distributions of the variables, bivariate measures of association, or multivariate relationships among all variables.
Coarsening: A method for protecting data that involves mapping confidential values into broader categories, e.g. a histogram.
Confidentiality: A “quality or condition accorded to information as an obligation not to transmit […] to unauthorized parties” (Fienberg 2005, as quoted in Duncan, Elliot, and Salazar-González 2011). Confidentiality addresses data already collected, whereas privacy (see below) addresses the right of an individual to consent to the collection of data.
Data swapping: Sensitive data records (usually households) are identified based on a priori criteria, and matched to “nearby records.” The values of some or all of the other variables are swapped, usually the geographic identifiers, thus effectively relocating the records in each other’s location.
Differential privacy: A class of formal privacy mechanisms. For instance, ε-differential privacy places an upper bound, parameterized by ε, on the ability of a user to infer from the published output whether any specific data item, or response, was in the original, confidential data (Dwork and Roth 2014).
Dirichlet-multinomial distribution: A family of discrete multivariate probability distributions on a finite support of nonnegative integers. The probability vector p of the better-known multinomial distribution is obtained by drawing from a Dirichlet distribution with parameter α.
Input noise infusion: Distorting the value of some or all of the inputs before any publication data are built or released.
Posterior predictive distribution (PPD): In Bayesian statistics, the distribution of all possible values conditional on the observed values.
Privacy: “An individual’s freedom from excessive intrusion in the quest for information and […] ability to choose [… what …] will be shared or withheld from others” (Duncan, Jabine, and de Wolf 1993, quoted in Duncan, Elliot, and Salazar-González 2011). See also confidentiality, above.
Sampling: As part of SDL, works by only publishing a fractional part of the data.
Statistical confidentiality or SDL – Statistical disclosure limitation: Can be viewed as “a body of principles, concepts, and procedures that permit confidentiality to be afforded to data, while still permitting its use for statistical purposes” (Duncan, Elliot, and Salazar-González 2011, p. 2).
Suppression: Describes the removal of cells from a published table if its publication would pose a high risk of disclosure.
Acknowledgments
John M. Abowd is the Associate Director for Research and Methodology and Chief Scientist, U.S. Census Bureau, the Edmund Ezra Day Professor of Economics,