Blog

Text Mining and Digital Humanities

By Cho Jiang on September 28, 2018

I’m proud to join Praxis 18-19 cohort and pleased to begin blogging here.

Nowadays, tremendous amounts of textual data are available, the volume and variety of data have exceeded the capacity of manual analysis. Computationally-driven text analysis, as a helpful tool for examining elements such as word frequencies, co-occurrence, and ‘topics’ of large corpus data, has been used in a variety of disciplines. It helps to connect humanist scholars to linguists, social scientists, computer scientists, and statisticians.

There is a massive amount of resources, code libraries, services, and APIs out there to help with textual information analysis, while coding skill definitely is a plus for learning text mining methodologies. Thanks to the opportunities that Praxis has created, I have started learning programming (Python) and fundamentals of natural language processing – mainly through code sessions in Scholars’ Lab complemented with online resources like Lynda.com and DataCamp. This training in Python (or other programming languages) can expand my imagination about what is possible and help to perform text analysis, but as a beginner in writing python programs (or programming), it is a very creative (and challenging) activity to me. When I am stuck in coding, people in Scholars’ lab are very helpful and supportive. I am grateful that I have been working with such a great team and receiving kindness, patience and support during my phd path.

Digital Humanities is a growing and promising area of research, and text mining in DH is a popular topic these days. We have well-developed digital history and digital literary studies. The advances in new technologies such as machine learning or deep reinforcement learning algorithms are expected to allow DH scholars to perform mass digitization of textual resources and automated text analysis. I would like to end on a more personal note, I look forward to learning more practical skills and exploring capabilities of text mining in DH research.