Skip to main content
IHGC'S DIGITAL HUMANITIES INITIATIVE – “Low Code Approaches to Topic Modeling for Humanities Data: Learning to Use BERTopic"
When
-
Where

Shannon 308

Event type
Workshop Event
University affiliation
UVA

Join the IHGC's Digital Humanities Initiative for a workshop led by Tim Tangherlini!

Working from a collection of stories about supernatural creatures living in a mythical world, the workshop will explore the uses of an LLM-based approach to topic modeling. Previous approaches to topic modeling, such as LSA or LDA, came with well-known problems, such as overfitting or incomprehensible labels, that made these approaches less than suitable for various applications in the Humanities. Using a BERT model, that includes considerations of context and thus can handle polysemy, along with a new strategy for labeling clusters, makes this approach far more amenable to various problems in the Humanities. Are there certain characteristics of trolls that are latent in the story space? Can we understand how the characteristics change over the course of a novel or series of novels? 

This workshop will show attendees how to get BERTopic up and running in their own workspace, and address different strategies for data preparation, topic modeling, topic labeling and visualization. No prior experience with code is necessary, although attendees may wish to familiarize themselves with the terminal on their computers.

This event is open to Digital Humanities scholars at UVA.