LibGuides: Digital Tools and Methods: Overview

Tutorials

The Library's Center for Digital Scholarship (CDS) offers workshops and consultations on digital tools, how to work with them and how to apply them to your own research. These workshops cover many phases of the research process, from data acquisition, through data preparation and analysis, to dissemination. The workshops are applicable to many disciplines, but they may be especially helpful to students in the humanities and softer social sciences who are looking for an introduction to digital methods.

A selection of CDS workshops is offered each semester; workshops are accessible from the library home page along with other library events. Workshops are also offered in classes, and can be requested by an instructor. Courses that incorporate digital tools and assignments work best when the instructor and CDS staff meet as early as possible to incorporate the workshops and skills into the course syllabus.

The list below is not exhaustive; new workshops are developed as needed. It includes links to workshop materials if they are available.

Acquiring Data: Web Scraping

Web scraping is the practice of gathering data through any means other than a program interacting with an API (or a human using a web browser). It’s a valuable skill to learn in order to create original datasets from online resources. It can help you create an original dataset of text or information that currently exists online. For example, you may need to gather social media content or collect text from press releases.

The workshop covers:

What do you need to know before you go about web scraping?
When should I web scrape? (i.e. when should you use an application programming interface (API) or use an already created dataset?)
How do I web scrape?

For those with programming skills or somewhere in-between, the Brown Library has created (and is continuing to work on) a web scraping toolkit to share workflows for how to use Python, R, and other open source resources for web scraping a variety of types of content.

Acquiring Data: OCR

Another way to generate digital data is to digitize your sources and then use optical character recognition software to turn them into digital documents that can be transformed or analyzed.

An asynchronous introduction to OCR.

An introduction to using Amazon, Microsoft or Google cloud services to perform OCR.

Although there is no regularly scheduled workshop on OCR techniques, it is possible to contact CDS and arrange for a consultation.

Manipulating Data: Regular Expressions: Search and Replace with Advanced Pattern Matching

Regular expressions are a concise way of doing complex pattern matching in textual or numeric data which are wonderfully useful for cleaning data, preparing data for text mining, and regularizing values and formats. They are almost always available in text editors as well as in programming languages. Regular expressions are a data and text processing building block that is used across many technologies and work flows. A useful tool to have in your back pocket!

Regular expressions is taught each semester. If you are unable to attend a workshop, there are many regular expressions tutorials available on the internet.

Cleaning OCR'd Text with Regular Expressions (Laura Turner for the Programming Historian)
RegexOne
Regex tutorial — A quick cheatsheet by examples
Regular Expressions Reference This site has tutorials as well. The reference section is comprehensive and includes different "flavors" of regular expressions.

Although regular expression syntax can differ slightly across editors, applications and programming languages, the fundamental concepts and capabilities remain the same - you can use almost any tutorial or reference that suits your learning style.

Analyzing Data: Introduction to Topic Modelling

Topic modeling is an unsupervised way to infer information about individual words based on how words co-occur or repeatedly appear in a corpus of texts.

Topic modeling has a number of promising applications for the humanities. Unlike Key Word In Context (KWIC) lists and collocate lists, which require human supervision to parse one word from another, topic modeling is an unsupervised way to infer information about individual words based on how words co-occur or repeatedly appear in a corpus of texts.

We can think of topic models like microscopes that allow researchers to study a large corpus. The number of topics a researcher chooses for the topic model allows the researcher to change the magnification of the analysis. If the researcher wants to look at larger structures, a lower magnification, with fewer topics, will work better. If the researcher wants to look at smaller structures, a higher magnification, with more topics, will work better.

This workshop explores how to use MALLET - from downloading the software to inputting a original dataset to creating a variety of topic models. This workshop uses the command line, but no prior experience is necessary.

Topic Modelling and text mining in general are a useful methodologies to explore large texts or collections of texts. The Text Mining guide introduces Voyant, a web-based text reading and analysis environment, AntConc, freeware cross-platform concordance program, and topic modelling.

Disseminating Data: Platforms and Web Basics

In this workshop we introduce Scalar and Omeka, two scholarly platforms for dissemination of research and archival data, and compare them to WordPress. We also provide a simple understanding of websites and tools such as CSS and Javascript so that you can select a platform and know how or if you can customize it. We present Brown’s Digital Scholarship at Brown domain service. If there is time, we also discuss the concept of static sites and why they may be of interest in digital scholarship projects.

Disseminating Data: Platforms and Web Basics workshop slides.

Learn More…

You can learn more about digital scholarship and digital humanities and engage with it in the following ways

The Public Digital Projects for Courses guide helps students and faculty create public digital projects, as part of a class, as a research project, or as a way of presenting their research or community-engaged scholarship.
Informal lunch time salons on DH topics (posted on Today@Brown and on the library home page)
Invited speakers
Individual consultations
CDS staff have worked closely with instructors who would like to include digital methods or a digital project in their course—discussing topics in digital scholarship, teaching how to use software or methods, and holding labs.
CDS teaches a summer institute in digital scholarship/digital humanities for graduate students.
If you have any questions, get in touch with the Center for Digital Scholarship (cds_info@brown.edu).

More workshops and tutorials in digital scholarship are also available. You can learn how to use Geographic Information Systems (GIS) and how to manage your data.

Data Management covers Naming and Organizing Files;Storage, Backup, and Versioning Data; Documenting Methods and Describing Data; Creating and Managing a Digital Research Notebook.
Geographic Information Systems provides an introduction to GIS software and data, and Brown GIS and Data Tutorials covers QGIS (open source GIS Software); ArcGIS Pro; Data Processing with Excel; and using US Census data.
Open Data is a great place to start if you are looking for data to work with, although it skews more heavily to tabular data in the social sciences.

Learning Objectives

This guide was designed to help you:

Become familiar with library workshop offerings on digital tools, digital humanities and digital scholarship
Start to. explore digital tools and methods and identify ways to use them in your own work.
Provide access to synchronous and asynchronous teaching materials on these topics.
Select methodologies that are appropriate for your own work, and understand how to deploy them.

Did this guide help you meet these objectives?

yes

yes: 2 votes (22.22%)

no: 7 votes (77.78%)

Total Votes: 9

Credits

Icons by Noun Project:

Web scraping by Guilherme Simoes licensed under CC BY 3.0 represents scraping data from a web page

document scan by Berkah Icon licensed under CC BY 3.0 represents scanning a document

pattern by Adhy Putra Tama licensed under CC BY 3.0 represents a pattern

topic models by Christina Barysheva licensed under CC BY 3.0 represents groupings of topic models

web coding by VectorsLab licensed under CC BY 3.0 represents a web page with code and text in it