Conclusion
This chapter has introduced text mining and text analysis methodologies, provided an overview of the major approaches to text analysis, and discussed some of the risks associated with analyzing data from online sources. Despite these risks, social and computer scientists are developing new text mining and text analysis tools to address a broad spectrum of applied and theoretical research questions, in academia as well as in the private and public sectors.
In the chapters that follow, you will learn how to find data online (Chapters 2 and 6), and you will learn about some of the ethical (Chapter 3) and philosophical and logical (Chapter 4) dimensions of text mining research. In Chapter 5, you will learn how to design your own social science research project. Parts II, IV, and V review specific text mining techniques for collecting and analyzing data, and Chapter 17 in Part VI provides guidance for writing and reporting your own research.
Key Terms (see Glossary)
Concordance 5
Content analysis 5
Conversation analysis 6
Critical discourse analysis (CDA) 6
Digital archives 15
Disambiguation 4
Discourse positions 6
Foucauldian analysis 6
General Inquirer project 5
Natural language processing (NLP) 4
Netnography 14
Sample bias 12
Sentiment analysis 4
Text analysis 3
Text mining 3
Virtual ethnography 14
Web crawling 4
Web scraping 4
Highlights
Text mining processes include methods for acquiring digital texts and analyzing them with NLP and advanced statistical methods.
Text mining is used in many academic and applied fields to analyze and predict public opinion and collective behavior.
Text analysis began with analysis of religious texts in the Middle Ages and was developed by social scientists starting in the early 20th century.
Text analysis in the social sciences involves analyzing transcribed interviews, newspapers, historical and legal documents, and online data.
Major approaches to text analysis include analysis of discourse positions, conversation analysis, CDA, content analysis, intertextual analysis, and analysis of texts as social information.
Advantages of Internet-based data and social science research methods include their low cost, unobtrusiveness, and use of unprompted data from research participants.
Risks and limitations of Internet-based data and research methods include limited researcher control, possible sample bias, and the risk of researcher passivity in data collection.
Review Questions
What are the differences between text mining and text analysis methodologies?
What are the main research processes involved in text mining?
How is analysis of discourse positions different from conversation analysis?
What kinds of software can be used for analysis of discourse positions and conversation analysis?
Discussion Questions
If you were interested in conducting a CDA of a contemporary discourse, what discourse would you study? Where would you find data for your analysis?
How do researchers choose between collecting data from offline sources, such as in-person interviews, and online sources, such as social media platforms?
What are the most critical problems with using data from online sources?
If you already have an idea for a research project, what are likely to be the most critical advantages and disadvantages of using online data for your project?
What are some ways text mining research be used to benefit science and society?
Developing a Research Proposal
Select a social issue that interests you. How might you analyze how people talk about this issue? Are there differences between people from different communities and backgrounds in terms of how they think about this issue? Where (e.g., offline, online) do people talk about this issue, and how could you collect data from them?
Further Reading
Ayers, E. L. (1999). The pasts and futures of digital history. Retrieved June 17, 2015, from http://www.vcdh.virginia.edu/PastsFutures.html
Bauer, M. W., Bicquelet, A., & Suerdem, A. K. (Eds.), Textual analysis. SAGE benchmarks in social research methods (Vol. 1). Thousand Oaks, CA: Sage.
Krippendorff, K. (2013). Content analysis: An introduction to its methodology. Thousand Oaks, CA: Sage.
Kuckartz, U. (2014). Qualitative text analysis: A guide to methods, practice, and using software. Thousand Oaks, CA: Sage.
Roberts, C. W. (1997). Text analysis for the social sciences: Methods for drawing statistical inferences from texts and transcripts. Mahwah, NJ: Lawrence Erlbaum.
2 Acquiring Data
Learning Objectives
The goals of Chapter 2 are to help you to do the following:
1 Recognize the role data plays in text mining and the characteristics of ideal data sets for text mining applications.
2 Identify a variety of different data sources used to compile text mining data sets.
3 Assess the advantages and limitations of using social media to acquire data.
4 Analyze examples of social science research using data sets drawn from different sources.
Introduction
While social scientists have for decades made use of data from attitude surveys, today researchers are attempting to leverage the growing volume of naturally occurring unstructured data generated by people, such as text or images. Some of these unstructured data are referred to as “big data,” although that