Text Mining Resources

Word frequency analysis, topic modeling, and more!

Gathering and Locating Text

Though tempting, most library database licenses available at the Library do not allow for text and/or data mining.

To help you identify alternate sources for news and other textual data, this page contains a growing list of free sources for data that can be gathered using APIs or web scraping that might appeal to researchers and students.

The library databases that do allow for text and data mining are:

Web Scraping Toolkit

Do you need to gather an original corpus on the web for your research? Do you have little to no programming expertise? This toolkit is designed for you. Please reach out to us if you need any assistance with any of the workflows. 

What is web scraping?

Web scraping refers to an automated process that results in the creation of an an original dataset by identifying components of a website, and copying pieces of information using a tool (software or programming language) into another file or organized structure for use in a variety of different contexts.Web scraping is used when an API is not available, or when the API does not provide information you need or in a format that you can work with.

Government and Court Records