How satisfied are you with your life nowadays? How happy did you feel yesterday? How anxious did you feel yesterday? To what extent do you feel the things you do in your life are worthwhile?
The data is collected by professional fieldworkers and the survey takes over a year to complete. The sampling strategy enables inferences to be made to the UK population. At the same time, a university-based project in the UK is measuring happiness by texting a purposive, non-representative sample of volunteers who have signed up to be part of a mobile phone-based study.8 The participants are asked every few days how they feel on a scale and about who they are with, their location and what they are doing. They are also able to submit a photo should they wish to. From this almost real time and repeated response data, happiness maps can be produced. Other research techniques to measure happiness might be to analyse data from Twitter posts for a sense of happiness or to analyse search engine records for evidence of future planning which can be calibrated as proxies for happiness (again, though, based on non-representative samples). See, for example, Preis et al. (2012) who used Google Trends data in a cross-national study of orientation towards the future and optimism. The self-published, consequential and trace data forms are very different from data gathered as part of random sample surveys. However, all these data and methods for measuring happiness have different explanatory power and value.
The opportunity for social science and policy makers is that citizens are – deliberately or consequentially – creating their own digital archives. This means that data generation with self-published, consequential and trace data is not a distinct (or costly) stage in the research process but is integral to the activity being undertaken. As we discuss below, such data can be collated, visualized and analysed in near real time, and updated continually. Citizens have the tools to document their own lives almost effortlessly and in more detail than ever before through access to monitoring technology and potential access to data about their health, movements and communications. See, for example, the development of so-called life logging and the Quantified Self.9 Data generation can also take the form of crowdsourced data, where collective intelligence and effort in the form of observations, data preparation tasks, idea generation and individual-level data are deposited and uploaded by volunteers, usually via the internet.10 Such data can also be collated automatically using software that captures information, including text and images on websites, to build databases. This can include collecting contact information, such as email and postal addresses, to produce samples for more traditional research methods such as surveys.11
As social science researchers looking at the wealth of new types of data, we must be mindful of the famous aphorism: ‘the medium is the message’ (McLuhan, 1964). All data collection instruments, as we consider below, are subjective and performative media (although to differing extents). With the new data sources (and social media sources in particular), this issue is likely to be all the more salient. Social science researchers must be aware that the media and data are mutually embedded in a manner that affects the data we might use. The flip side of this challenge is that the gap between data and subject, between my-self and my-data or data collected about me, is closing both temporally (data about us is more closely contemporaneous to the activity/behaviour that has generated the data) and ontologically (more and more the data and the activity are one and the same thing).
So where does this take the social scientist? Is it just more choices about the data to collect, use and link? Or is it more fundamentally a step change in how people’s lives are being captured, documented and measured? What challenges are posed, for example, in terms of data access and data quality?
2.2.2 Evolving Traditional Data Types
To understand the changing data environment it is important to begin by reflecting on the expansion and enrichment of so-called orthodox intentional data and, in particular, social surveys. In the UK, longitudinal surveys, including the British Cohort Study,12 the English Longitudinal Study of Ageing13 and the Census Longitudinal Study14 now constitute very rich sources of information for understanding change over decades of people’s lives (see also Chapters 4 and 5). International surveys, such as the World Values Survey15 can provide insights into global opinion. Such surveys and the analyses conducted on them often include contextual data, such as area-level employment rates. The data can be analysed online through tools such as Nesstar.16 Nesstar allows users to access and analyse archived data, including government survey data such as the Labour Force Survey,17 the British Social Attitudes Survey18 and the British Crime Survey.19 The resource includes information on data origin, sampling and the coding of variables, as well as the original questionnaires. Similar data resources also now exist for qualitative and mixed methods data. Textual data from interviews, focus groups and observational studies can be accessed through, for example, the UK Data Service.20
Survey data can be highly detailed and, depending on the sample design, allow inferences to be made to a wider population. However, one of the limitations is the survey process itself in terms of the sample size, usually restricted by cost, which can limit comparisons between areas and groups due to low sample numbers. In addition, response rates can be low, which can introduce bias into the population estimates. The survey questions themselves can also have limitations in terms of constrained responses and self-reporting bias. The latter is where respondents give an answer they feel is expected, rather than what they really think. Moreover, there is also a well-established gap between reported attitudes and what people actually do in practice, and indeed between what people say they do and what they actually do (see De Vaus, 2002; Blasius and Thiessen, 2012).
Secure data that is not regarded as suitable for widespread, unregulated public use can be analysed in so-called ‘safe settings’ to ensure there are no risks to confidentiality (see, for example, the UK Secure Data Service,21 the HM Revenue and Custom’s (HMRC) Data Lab22 and the Ministry of Justice (MoJ) Data Lab23). The Secure Data Service allows access to individual-level data that is more detailed than that available under standard licensing (such as smaller geographic areas) and so provides potentially richer sources of evidence for social science research. The user analyses the data remotely rather than downloading it. The analytical outputs are then checked by the data provider. The conditions of use are based on licensing agreements with users as well as user accreditation, individual training and trust. HMRC’s data lab allows access to individual tax records but requires users to do so at the HMRC’s premises under controlled conditions.