The growing access to official administrative records of public service use also includes, for example, patient health records and school performance records through such initiatives as the UK Administrative Data Liaison Service (ADLS).24 These consequential data sources are, in theory, complete rather than being based on samples and can be of great value given their detail and coverage. However, like all datasets there are likely to be issues of missing data that the social science researcher needs to be aware of, including individuals who have not been traced or recorded. There are also likely to be duplicate records. We discuss an example of this in Chapter 3. Access to specific variables can also be restricted and the coverage, of course, is limited to the variables collected as part of the administration process and the time point of the data collection. As a result there are limitations on the number of research questions that can be addressed.
Research access to this kind of administrative data is part of the UK Government’s drive towards Open Data, whereby government departments and agencies are being required to provide greater access to service use and performance data for the purposes of transparency and accountability (Open Public Services White Paper, 2011).25 For an overview see Halford et al. (2013), Wind-Cowie and Lekhi (2012) and Shakespeare (2013). As well as providing an alternative to orthodox intentional data, administrative data also expands the range of available information. The use of such data is likely to be new to many social scientists and its properties, coding frames and terms of use can be very different and may require the acquisition of new skills, alongside new knowledge or greater interdisciplinary working. Nevertheless, it is notable that in a recent survey of over 300 (self-selected) social science researchers, nearly two-thirds had used administrative data in their research (Elliot et al., 2013), although a similar proportion (61 per cent) reported encountering barriers when trying to access such data.
2.2.3 Innovations in Linking Data
Methodologically, there are increasing opportunities to address research questions by data linking using statistical matching and drawing on multiple data sources. Well known examples of this include: the linking of hospital data to the Millennium Cohort Study26 (see Calderwood, 2007); the Work and Pensions Longitudinal Study (WPLS)27 which links benefit and programme information held by the Department of Work and Pensions (DWP) with employment, earnings, savings, tax credit and pension records from HMRC; and the Longitudinal Study of Young People in England (LSYPE),28 which links annual survey data to data from the School Census29 (as discussed below).
The methodology of data or record linking can be simply one of matching record numbers between multiple sources but can also be probability based. This involves linkages based on similar characteristics as opposed to unique identifiers. Computational statistical techniques are involved in optimizing record matching rules and weighting different variables in the matching process. Data preparation is a key stage of this research design. Account needs to be taken of missing data and data entry errors and quality assurance procedures need to be put in place. For further discussion see Herzog et al. (2007).
It is argued that data linkage can be cost saving and enable analyses to be conducted that would otherwise not be possible or would involve further primary data gathering. Best practices for linking data and the research and ethical issues raised are slowly being developed. A key aspect of this is the terms of use of the different data sources. Some surveys now ask for the respondent’s permission for the anonymous use of their responses for the purposes of linking with other datasets. Examples include the National Survey of Wales and the Scottish Longitudinal Study. The UK’s Economic and Social Research Council is presently reviewing the area of data access and linkage as part of its Administrative Data Task Force (see Boyle, 2012).30 The International Health Data Linkage Network is a useful information resource on linked data.31 For further discussion see Gill (2001), Herzog et al. (2007), Mason and Shihfen (2008) and Chapter 3 in this volume.
2.2.4 Freedom of Information Requests for Social Research
In the UK, legislation has also made public sector information increasingly available for transparency and accountability purposes and potentially for social science research. Under the Freedom of Information Act 2000 (FOI), requests for detailed records of what we term consequential data held by public bodies can be made. Unless there is good reason not to, the organization holding the data must provide the information within 20 working days.32 Accepted reasons for refusal include cost, whether the request is vexatious, and if it would prejudice a criminal investigation. The legislation has been widely used to examine transparency in government. Thousands of requests have been made since the introduction of the act, including many in areas that social science research has a track record of examining, such as government decision-making and public spending. Access to this type of data has facilitated research breakthroughs in these areas including, notably: information on MP’s expense claims, records of donations to political parties, extent of care home abuse allegations, detention of children in police cells, links between police forces and commercial companies, police work force demographics and gambling spending levels. However, as reported in Lee (2005), the majority of such requests are not for what might be considered standard social science research purposes. Nevertheless, some examples in the UK context include: local authority data on business cases for new schools (Khadaroo, 2008), Ministry of Defence medical data (Seal, 2006), Department of Health data on drug addiction policy (Mold and Berridge, 2007) and police force crime data (Hutchings et al., 2006). It is notable that as of 2013 new regulations relating to open data rights require data released under FOI requests to be prepared in reusable formats, and that the regulations also allow for the data to be used commercially.33
2.2.5 Commercial Data Sources and Providers
In parallel to these developments, commercial data companies are increasingly providing highly detailed, individual-level information products combining different types of data, including intentional, consequential, trace and synthetic data. The information can include such details as: name, address, full postcode, age, gender, income, occupation, number of children, household income, house type, tenure, education, consumption, length of residence, car ownership, insurance packages, ownership of ICT products, holidays, smoking, leisure activities and social attitudes34 (Purdam et al., 2004).
Such data is compiled from different sources, including: surveys; warranty forms where citizens agree to the shared use of their details; public records; administrative records such as the Electoral Register and house sale information; and consumption records. Whilst some of this information may be considered personal, it has already been in the public domain in some form or permission for use has been given at origin (see