Comparison & Similarity: Using Machine Learning to Study a Large Collection of Russian Diaries

When

Mon, Feb 3 2025, 4 - 5:30pm

Where

NCH 236

Event type

Speaker Event

University affiliation

UVA

DH Research Data Type

Text analytics

Large text corpora

This talk explores how machine learning, specifically transformer-based large language models (LLMs), can analyze an extensive collection of Russian historical diaries (1800-2018). LLMs enable computational methods like semantic text similarity and clustering by creating numerical representations of the texts. These methods can reveal significant topics and subjects within the diaries, such as prices and weather, offering new possibilities for digital scholarship. Drawing on John Unsworth’s concept of scholarly primitives, especially comparison, the talk will evaluate the potential of semantic text similarity methods and their challenges for research in the digital humanities.