Scholars' Lab Blog //Topic Modeling Twitter
Blog //Topic Modeling Twitter

What is topic modeling?

Topic modeling is type of statistical model that sorts through a large corpus of writing through language processing algorithms with the purpose of discovering the broad topics under discussion by grouping together words frequently used in tandem.  This method has been used in the past by scholars working on distant reading, a method of studying literature that aggregates and analyzes massive amounts of text with the goal of uncovering the fundamentals of literature on a vast, universal level (the opposite of close reading, which focuses on singular and often exceptional canonized texts). 

How did we implement it?

Using the program Mallet (Machine learning for language toolkit), I was able to import the text files of tweets and run the program.  The program resulted in two documents: a list of “topics” consisting of keywords, and meta data surrounding these topics.  This meta data included the percentage of tweets that focused one each “topic” and a list of the relevant tweets.

What did we learn?

Topic modeling is often assumed to be a primary text analysis tool for digital humanists. However, the utility of topic modeling is more ambiguous.  Where topic modeling may be an integral tools for studying an enormous body of law documents, for example, does it have the same utility for a corpus of tweets?

Key “topics” revealed by the Kim Kardashian tweet corpus included clusters similar to the ones following: 

“kardashian kollection sears dress spring”

“excited line tomorrow launch coming sisters”

“love guys wow million omg twitter followers congrats”

“only fragrance buy free special people”

“kendalljenner kyliejenner kylie sister kendall fun proud sis”

“back nyc home missed time sleep”

“shoes shoedazzle obsessed love”

“keeping kardashians tonight season watch watching episode coast tune”

“book kardashian today selfish beauty”

“mom bruce dad cute kris khloe proud”

“fashion love show week paris watching givenchy icon”

“city selfie years sex double yeezus saint magic pablo”

None of these “topics” are truly revealing.  We know Kim tweets about her family, about fashion, about her personal life (Kanye West frequently appears in multiple “topics”).  We know that twitter is a key place for the Kardashian to promote their show Keeping up with the Kardashians and this is blatantly clear in the multiple word clusters encouraging people to “tune in.” 

The majority of the topics were simply variations of the aforementioned topics.  One notable exception was “people armenian power join stand april truth bro real genocide armo pray proud.” We had known that Kim tweeted about her Armenian heritage and that the Kardashian family has been vocal about the Armenian genocide.  What the topic modeling allowed me to do was find every tweet in which Kim mentioned anything relating to her Armenian heritage.  It further allowed me to place these tweets within broader categories.  Half of what I will term the Armenian tweets were also related to charity and activism tweets, using the same language and subject matter as anti-bullying campaigns.  The other half of Armenian tweets were of a social nature, discussing cooking Armenian food, going out to eat, and discussing notable Armenians.

Everyone: Please call Speaker Pelosi TODAY at 202-225-0100 and URGE her to schedule a vote on H.Res.252, the Armenian Genocide Resolution

— Kim Kardashian West (@KimKardashian) December 9, 2010


While topic modeling did not offer much in terms of sorting through the corpus of Kardashian tweets at our disposal, it did allow me to collect and analyze one specific thematic topic.  After this preliminary step, however, the onus to continue to study these tweets and how they might inform our study of the Kardashians public use of their Armenian heritage is on us.

Cite this post: Alicia Caticha. “Topic Modeling Twitter”. Published March 15, 2017. Accessed on .