The Library's Center for Digital Scholarship (CDS) offers workshops and consultations on digital tools, how to work with them and how to apply them to your own research. These workshops cover many phases of the research process, from data acquisition, through data preparation and analysis, to dissemination. The workshops are applicable to many disciplines, but they may be especially helpful to students in the humanities and softer social sciences who are looking for an introduction to digital methods.
A selection of CDS workshops is offered each semester; workshops are accessible from the library home page along with other library events. Workshops are also offered in classes, and can be requested by an instructor. Courses that incorporate digital tools and assignments work best when the instructor and CDS staff meet as early as possible to incorporate the workshops and skills into the course syllabus.
The list below is not exhaustive; new workshops are developed as needed. It includes links to workshop materials if they are available.
Web scraping is the practice of gathering data through any means other than a program interacting with an API (or a human using a web browser). It’s a valuable skill to learn in order to create original datasets from online resources. It can help you create an original dataset of text or information that currently exists online. For example, you may need to gather social media content or collect text from press releases.
The workshop covers:
For those with programming skills or somewhere in-between, the Brown Library has created (and is continuing to work on) a web scraping toolkit to share workflows for how to use Python, R, and other open source resources for web scraping a variety of types of content.
Another way to generate digital data is to digitize your sources and then use optical character recognition software to turn them into digital documents that can be transformed or analyzed.
Although there is no regularly scheduled workshop on OCR techniques, it is possible to contact CDS and arrange for a consultation.
Regular expressions are a concise way of doing complex pattern matching in textual or numeric data which are wonderfully useful for cleaning data, preparing data for text mining, and regularizing values and formats. They are almost always available in text editors as well as in programming languages. Regular expressions are a data and text processing building block that is used across many technologies and work flows. A useful tool to have in your back pocket!
Regular expressions is taught each semester. If you are unable to attend a workshop, there are many regular expressions tutorials available on the internet.
Although regular expression syntax can differ slightly across editors, applications and programming languages, the fundamental concepts and capabilities remain the same - you can use almost any tutorial or reference that suits your learning style.
Topic modeling is an unsupervised way to infer information about individual words based on how words co-occur or repeatedly appear in a corpus of texts.
Topic modeling has a number of promising applications for the humanities. Unlike Key Word In Context (KWIC) lists and collocate lists, which require human supervision to parse one word from another, topic modeling is an unsupervised way to infer information about individual words based on how words co-occur or repeatedly appear in a corpus of texts.
We can think of topic models like microscopes that allow researchers to study a large corpus. The number of topics a researcher chooses for the topic model allows the researcher to change the magnification of the analysis. If the researcher wants to look at larger structures, a lower magnification, with fewer topics, will work better. If the researcher wants to look at smaller structures, a higher magnification, with more topics, will work better.
This workshop explores how to use MALLET - from downloading the software to inputting a original dataset to creating a variety of topic models. This workshop uses the command line, but no prior experience is necessary.
Topic Modelling and text mining in general are a useful methodologies to explore large texts or collections of texts. The Text Mining guide introduces Voyant, a web-based text reading and analysis environment, AntConc, freeware cross-platform concordance program, and topic modelling.
You can learn more about digital scholarship and digital humanities and engage with it in the following ways
More workshops and tutorials in digital scholarship are also available. You can learn how to use Geographic Information Systems (GIS) and how to manage your data.
This guide was designed to help you:
Icons by Noun Project:
Web scraping by Guilherme Simoes licensed under CC BY 3.0 represents scraping data from a web page
document scan by Berkah Icon licensed under CC BY 3.0 represents scanning a document
pattern by Adhy Putra Tama licensed under CC BY 3.0 represents a pattern
topic models by Christina Barysheva licensed under CC BY 3.0 represents groupings of topic models
web coding by VectorsLab licensed under CC BY 3.0 represents a web page with code and text in it