We find, for example, the same forms of authority and subordination, of competition, imitation, opposition, division of labor, in social groups which are the most different possible.
(Simmel, 1895, p. 55)
Let us assume, to take but one example, that we were to establish empirically that people on social media sometimes find themselves disillusioned by their own social media use, and that they feel as if they are just like cogs in a bigger machine beyond their individual control. Let us also assume that our analysis made us think that this may even be a form of oppression or exploitation, where social media conglomerates make a profit from what disillusioned and exploited users post online. We may simply invent a new flashy theoretical concept for this, say: ‘digital brainwash’ or ‘social media disconnect’. But we could also make the effort of going back to already established social theories. In my present example, a good option may have been Karl Marx’s 1844 theory about alienation (Marx, 1844, pp. 69–84). The social form of alienation, in that case, may transcend the contexts of nineteenth-century industrial capitalism and social life on the twenty-first-century internet. Once we see that, we also enable other insights such as, for example, that our present-day society may still be quite similar in some respects to nineteenth-century industrial capitalism.
I do not mean to say that such theoretical connections are not already made by many scholars, nor do I mean that anyone who does not do it at every opportunity is lazy or wrong. I myself am a repeat offender. And, conversely, it may indeed sometimes be a good idea actually to invent new concepts – how else would theories develop? – and in most cases there needs to be some sort of updating or modification of the old theory that is re-employed. On the one hand, this book is an explicit effort to explore and show how to apply existing, trusty, and well-worn social theory systematically, through data science, to social media politics with this kind of ambition and aspiration. On the other hand, the book is just as much an encouragement to combine and re-invent theories in eclectic ways. I will return, throughout the book, to issues of theory, as universal truth versus theory, as emergent and constantly renegotiated.
A bit of anarchy
Data scientists Rachel Schutt and Cathy O’Neil (2013, p. 9) argue that data scientists have much to benefit from collaborating with social scientists. This, they write, is because social scientists ‘do tend to be good question askers and have other good investigative qualities’. They write about the hyped and still emerging speciality of data science that ‘it’s not math people ruling the world’. Rather, they argue that when different ‘domain practices’ intersect with data science, each such practice is ‘learning differently’ (Schutt and O’Neil, 2013, p. 219). Taking my cue from Schutt and O’Neil, I ask in this book what type of such different learning – which methodological developments – can follow when sociology meets data science.
This is obviously a vastly open question with a multitude of potential answers. Therefore, my suggestion, which draws to a great extent on my personal methodological and theoretical preferences as an interpretive sociologist, is but one possibility. The main idea that I am putting forward is that the data-drivenness of interpretive sociology, as formulated as a hands-on framework by methodologists such as Barney Glaser and Anselm Strauss (1967), and particularly Glaser’s (1978) notion of ‘theoretical sensitivity’, can be dusted off and brought together with the data-drivenness of data science practices.
Many would say that the respective general views on science and methodology between big data and grounded theory research are too divergent, to the point that they are even incompatible. I do not believe that to be the case. Still, to experiment with merging methods that are labelled ‘qualitative’ and ‘quantitative’ is not a good idea if you want everyone to agree with you. In both camps (because sadly, that is still what they are), it is equally easy to find people who are dogmatic. So, to find productive ways across, there is definitely a need to think unconventionally. Feyerabend had some good ideas about how science in general could do well with a dose of theoretical anarchism, and claimed that research methods must always be opposed and questioned:
The idea of a method that contains firm, unchanging, and absolutely binding principles for conducting the business of science meets considerable difficulty when confronted with the results of historical research. We find, then, that there is not a single rule, however plausible, and however firmly grounded in epistemology, that is not violated at some time or other. It becomes evident that such violations are not accidental events, they are not results of insufficient knowledge or of inattention which might have been avoided. On the contrary, we see that they are […] absolutely necessary for the growth of knowledge.
(Feyerabend, 1975, p. 7)
This book does not swear by the entire philosophy of Feyerabend, but it does align with his idea that it is good for science if we violate some of its rules every now and then. It might be a way to move forward. This is therefore neither a book about true data science nor about dogmatic sociology (whatever those might be). It demands that the reader keep an open mind in relation to the transcending character of the presented analytical approach.
As argued above, theory needs data. But this book is not about data science being told correctly by sociology. It is just as much the other way around. And maybe not so much telling as mutual learning. Throughout the central parts of this book, we shall look at how knowledge about some particular data can be advanced through some particular social theory. I will also discuss how theory can advance the formulation of the methodology by which we approach the data. The overarching goal is the productive meeting of the two.
There are new types of data that demand new types of methods, while there are also new types of research questions arising that call for developing new theoretical approaches. This demands the advancing of our perspective on data theory and methods in parallel. In other words, developing a data theory approach. The term ‘data theory’ as such has been used to some extent already in statistics. William G. Jacoby, a researcher on public opinion and voting behaviour, has used it to refer to the process by which the researcher, being theoretically driven, chooses some aspects of the observable reality as the data to be analysed:
Data theory examines how real world observations are transformed into something to be analyzed – that is, data. Any empirical observation provides the observer with information. Typically, however, only certain aspects of this information will be useful for analytic purposes. The researcher takes a vitally important step in his or her analysis simply by culling out those pieces of information that are used from those that could be considered, but are not. The information that is used comprises the data, and it is clearly only a subset of observable reality. Hence, it is important to distinguish between observations (the information that we can see in the real world around us) and data (the information that we choose to analyze). The central concern of data theory is to specify how the latter are derived from the former.
(Jacoby, 1991, p. 4)
Furthermore, there was even a Department of Data Theory in the 1990s at the University of Leiden in the Netherlands, working to adapt classical statistical methods to suit ‘the particular characteristics of data obtained in the social and behavioral sciences’ as they ‘are often data that are non-numerical, with measurements recorded on scales that have an uncertain unit of measurement’ (Meulman, Hubert, and Heiser, 1998, p. 489). I, however, use the concept of data theory as a very broad label for the work that this book does in order to bring social theory and data science closer to one another.
Data piñata
While most data scientists are hired by industry, they also exist within a number of disciplines in academia where the focus is on computational methods applied to unconventional or messy data. Rachel Schutt and Cathy O’Neil