Integrating social theory in text analysis: Constrained topic models in social research
Title: Integrating social theory in text analysis: Constrained topic models in social research
DNr: NAISS 2023/22-1043
Project Type: NAISS Small Compute
Principal Investigator: Anastasia Menshikova <anastasia.menshikova@liu.se>
Affiliation: Linköpings universitet
Duration: 2023-10-05 – 2024-11-01
Classification: 50401
Keywords:

Abstract

Textual data has been a crucial data source for scientific inquiries and has produced insights in many fields. Recent methodological advancements in computational text analysis have enabled the analysis of large-scale textual data containing millions of documents. To be able to fully exploit these techniques and data sources, it is important to allow for data-driven discoveries while still integrating theory and prior knowledge into the analysis. In this project, we argue that constrained topic models are a suitable method to complete this integration. Constrained topic models are an approach to estimating latent themes in a text corpus, and the constraining extension allows one to place informative priors on a subset of the parameters. However, there is no formalized framework for how to utilize constrained topic models as a tool for integrating social theory in data-driven analysis. To fill this void, we propose a framework with the starting point of a theoretical construct(s) and describe several steps in how to end up with valid measurements for the categories of researchers’ interest. More precisely, we propose an iterative research process where theoretical and prior knowledge gets updated based on the analyses. Furthermore, we propose tools to validate the model, both from a sociological and from a statistical point of view. To highlight the utility of our approach, we use abortion discourse in the US Congress as a case where we apply our framework. Using this case, we show the fruitfulness of seeded topic models and our framework for theoretically driven sociological inquiries.