Neoliberalism: a distant reading

There's been some recent discussion of computation and scholarship, and how their interaction relates to neoliberalism. I thought it might be useful to look carefully at one such computational tool (a topic model), and also learn a bit about neoliberalism at the same time. What can we find out about the word with this tool? What becomes visible, and what is hidden? What role (if any) should this type of approach play in a scholarly methodology?

This page contains 100 "topics" (groups of words that appear together) that were automatically extracted from a corpus of about 10,000 JStor articles that contain the term "neoliberal". The articles are provided through the Data for Research API, and are represented only as unordered word counts due to copyright. This corpus is about half of all the total articles matching the search term, biased towards the earlier period. (JStor only allows 1000 articles to be downloaded at a time, and I was not able to request a date range narrower than two years.) "Neoliberal" first appears in the 80s, becomes very popular in the 90s, and has maintained the same level since.

I trained the model on my MacBook with the mallet package in the free statistical package R. It took about an hour to collect data and train the model. This process isn't exactly easy, either conceptually or procedurally. (I have a PhD in machine learning, and I coded large parts of Mallet.) But it's also not particularly hard, and many people who are primarily scholars have done similar work. One of my goals is to make this method, and related methods, more transparent, reliable, and accessible.

Each topic is shown with a list of words and a timeseries plot. The list of words is the top 50 most highly weighted words in the topic. The timeseries shows the average proportion of all documents published in a given year that the algorithm has assigned to the topic. Decades are marked with vertical lines at 1990, 2000, and 2010. The y-axis shows tick marks at each percentage point, to a minimum of 1%. More ticks means a greater maximum. Topics are sorted by their date centroid: declining topics first, rising topics last. These plots represent proportion, not overall count: the earliest years are a small fraction of the full corpus.

Several patterns emerge. The earliest use of the term is mainly in Spanish-language journals. This fact is observable because the term I searched for ("neoliberal") happens to be the same in English and Spanish. In the 90s the term appears more in English-language literature, especially in relation to politics, economics, and game theory. In more recent years, there has been a strong and increasing connection to healthcare, race, the environment, gender, art, and human rights. A topic relating to politics, culture, and power is particularly prominent. Regional studies appear throughout the corpus, often in the developing and post-communist world.

This model also demonstrates some of the limitations of the corpus and of the algorithm. The prevalence of Spanish, along with a bit of French, Italian, German, and Portuguese, highlights some oddities about multilingual work. I removed high-frequency "stopwords" for English, but not for other languages. As a result, these languages get grouped into single topics. Because of the unusually large quantity of Spanish, there are several distinct topics nevertheless. Tellingly, there is a topic consisting of Spanish words with accents. It's likely that JStor is inconsistent in how it deals with accented characters. Another topic (pol tica econ) appears to group fragments of words that are missing accented vowels. There's an English version of the word-fragments topic probably due to end-of-line hyphens. Unusually large documents are poorly handled. There appear to be a series of large monographs in the late 1990s that each absorb an entire topic.

Finally, although most topics appear to correspond to recognizable discourses, I'm deliberately not showing the documents associated with each topic, to bring into focus the incompleteness of this view. To continue beyond a simple overview of the corpus, it would be necessary to connect back to the sources, and actually read some articles. Any other mode of use would be incomplete.