Search DH@UVA

Your Portal to the Digital Humanities at the University of Virginia

Resource Page: Text Technologies


What is text analysis?

Text analysis, sometimes referred to as text mining or text analytics, is an umbrella term for research methods that use computers to surface information from large amounts of unstructured text. Novels, newspapers, journals, blog posts, emails, tweets, and metadata are all examples of unstructured text. Text analysis methods aid researchers in sifting through large text corpora to identify patterns and make connections.

What is distant reading and how is it different from close reading?

Distant reading refers to the use of computers to ingest and analyze large text corpora — more than would otherwise be possible for a single person. As opposed to close reading which prioritizes focusing on the details of a certain text, distant reading facilitates observation and analysis of recurring patterns and concepts that recur over time in large text corpora. In literary studies and comparative literature, distant reading and text analysis are closely aligned in their meaning. 


Tools for Text Analytics Projects

  • Mallet  (machine learning applications to text)

  • Beautiful Soup  (Python library designed for screen scraping)

  • TAPoR   (Xml, Html, and plain text analysis)

  • Juxta   (comparison/collation tool)

  • WordSeer  (comparison tool)
  • Lancsbox (software package for the analysis of language data and corpora)


Current Journals 


Articles for Reference