Comparison & Similarity: Using Machine Learning to Study a Large Collection of Russian Diaries
When
-
Where
NCH 236
Event type
Speaker Event
University affiliation
UVA
DH Research Data Type
Text
Large text corpora
This talk explores how machine learning, specifically transformer-based large language models (LLMs), can analyze an extensive collection of Russian historical diaries (1800-2018). LLMs enable computational methods like semantic text similarity and clustering by creating numerical representations of the texts. These methods can reveal significant topics and subjects within the diaries, such as prices and weather, offering new possibilities for digital scholarship. Drawing on John Unsworth’s concept of scholarly primitives, especially comparison, the talk will evaluate the potential of semantic text similarity methods and their challenges for research in the digital humanities.