Miriam Hurtado Bodell, Måns Magnusson, Marc Keuschnigg
Sociologists increasingly discuss the need for more formal ways of measuring meaning from digital text archives. We bring to attention the seeded topic model, a semi-supervised and scalable extension to the standard topic model, that allows the infusion of social science domain knowledge to the computational learning of meaning structures. Seed words help crystallize topics around known concepts, issues, or ideas, while allowing for topic models' basic functionality of finding associations in text data based on word co-occurrences. The method allows identification of discourses on predefined themes over time and the measuring of a theme's shared interpretation via its associations to other frequently co-occurring topics. Illustrating this theoretically informed method, we extract longitudinal measures of the Swedish understanding of immigration in a vast newspaper corpus containing millions of news articles from 1945 to 2019.