Skip to main content
Comparison & Similarity: Using Machine Learning to Study a Large Collection of Russian Diaries 
When
-
Where

NCH 236

Event type
Speaker Event
University affiliation
UVA
DH Research Data Type
Text
Large text corpora

This talk explores how machine learning, specifically transformer-based large language models (LLMs), can analyze an extensive collection of Russian historical diaries (1800-2018). LLMs enable computational methods like semantic text similarity and clustering by creating numerical representations of the texts. These methods can reveal significant topics and subjects within the diaries, such as prices and weather, offering new possibilities for digital scholarship. Drawing on John Unsworth’s concept of scholarly primitives, especially comparison, the talk will evaluate the potential of semantic text similarity methods and their challenges for research in the digital humanities.