Courses
Over the last three years, I have been teaching two courses at Leipzig University in alternating fashion. The first one, Toolbox CSS, deals with things related to using digital trace data and covers web scraping and basic text analysis. The second one, Text Mining for Social Scientists, is focused on the use of text data for sociologists. Moreover, I have taught an intro to R which covers basic data wrangling and visualization, and an introduction to web scraping as part of the Workshops for Ukraine series. All courses obey tidy principles and feature a bookdown. Some of the bookdowns even feature videos that walk you through the materials. Slides are available upon requests.
Currently I am preparing an updated and more extensive version of the Toolbox CSS course. Extensive means that the web scraping part will go beyond the current material and feature the acquisition of data from dynamic websites using Selenium. The Text Mining section will also cover newer developments in NLP (i.e., Large Language Models) and what they bring to the Social Sciences. Moreover, I will include sections on the analysis of spatial data and simulation of human behavior using agent-based models.
Toolbox CSS
Description: Recently, a “computational turn” has taken hold of the social sciences. Digital data and novel methods originating from the computer sciences offer important opportunities for sociology. The course starts with teaching practical skills to collect digital trace data online (web scraping, API “harvesting”). As the lion’s share of this material is in textual format, the students will subsequently learn to evaluate large text archives in an automated way through machine learning techniques. The programming of these tools is performed in R, for which basic knowledge is required. Students have to write an empirical paper that answers a sociologically relevant research question using at least one of the methods learned and give extensive feedback to others’ projects in class.
The bookdown script can be found here.
Workshops for Ukraine: Web Scraping
Description: Digital trace data are an integral element of CSS (cool social scientific) research. This course will show you how this is done on an intermediate level. This implies that we will not cover the fundamentals of selecting and downloading things from static web pages on the one hand, but also not go as far as firing up RSelenium to scrape dynamic web pages on the other. We will start with a brief revision of CSS selectors, then we move on to rvest to simulate a browser session, fill forms, and click buttons. In the second half of the session, APIs and how to make requests to them will be covered. Tangible examples for API queries will be shown. In the end, exemplary workflows will be introduced to provide a scaffolding for students’ future research projects.
The course materials can be found on the Workshops for Ukraine website.