Skip to Main Content

Text Mining Resources

Word frequency analysis, topic modeling, and more!

Gathering and Locating Text

Though tempting, most library database licenses available at the Library do not allow for text and/or data mining.

To help you identify alternate sources for news and other textual data, this page contains a growing list of free sources for data that can be gathered using APIs or web scraping that might appeal to researchers and students.

The library databases that do allow for text and data mining are:

  • ProquestTDM Studio
  • Constellate 

Web Scraping Toolkit

Do you need to gather an original corpus on the web for your research? Do you have little to no programming expertise? This toolkit is designed for you. Please reach out to us if you need any assistance with any of the workflows. 

What is web scraping?

Web scraping refers to an automated process that results in the creation of an an original dataset by identifying components of a website, and copying pieces of information using a tool (software or programming language) into another file or organized structure for use in a variety of different contexts.Web scraping is used when an API is not available, or when the API does not provide information you need or in a format that you can work with.

Government and Court Records

ProQuest TDM Studio

ProQuest TDM Studio is a platform that allows you to text and data mine (in other words, gather and analyze large amounts of text) content from news, scholarly and other kinds of publications that Brown subscribes to via ProQuest.

You may find ProQuest TDM studio useful if you'd like to:

  • Identify trends in a publication over time
  • Use data visualizations to represent texts
  • Gather a large "corpus" or collection of texts for text analysis, machine learning, etc.
  • Use a web-based interface to run Python and R code using these texts
  • Query, transform, and export text data to your computer

Screenshot of the ProQuest TDM interface - blue pixelated background with links to Visualizations and Workbench

A screenshot of the ProQuest TDM Studio frontpage.