Skip to Main Content

Database: ProQuest TDM (Text Data Mining)

ProQuest TDM provides access to a wide variety of ProQuest texts (publications, newspapers, zines, papers) in a platform that allows text analysis. This guide introduces the platform and included text holdings, as well as two approaches for analyzing text

Selecting Publications

  1. Log in to your TDM Studio Account
  2. Click the blue "Create New Dataset" button on the right
  3. You have the option to create a dataset by selecting either:
    1. Publication Titles (choose individual publications, can select multiple)
    2. ProQuest Databases (choose all the publications indexed in a database, e.g. everything in Ethnic News Watch, or Latin American Newsstream)
  4. Use the search box to find publications
  5. When searching for publication titles, keep in mind that the default is 20 results per page. If you don't see what you're looking for on the first page, click to the next page of results
  6. You can select multiple publications by checking the box to the left of the title
  7. You can find out more details about the publication and date coverage by clicking the publication title 
  8. As you are selecting your publications, keep in mind:
    1. You may see multiple results for the same publication title. A newspaper/source/publication may have different online and print editions or may have historical and current editions with different date ranges. Be sure to look at the Source Type and Date Range columns. Click on the publication title for more information.
    2. If you are selecting multiple publications of the same name (their current and historical versions), try to generate your dataset starting from the most recent publication and going back chronologically. For example, if a publication has historical coverage from 2000 – 2005, and a current coverage of 2003 – current, it may be better to generate your dataset with the current coverage first, and then with the historical coverage limiting it to 2000-2003 during the content refinement step
    3. While there is no limit to the number of publications you select, a dataset can contain a maximum of 2 million records. Keep this in mind when refining your search in the next step.
  9. When you are finished selecting your publications or databases, click "Next: Refine Content" at the bottom of the page

Refining the Search

  1. You can narrow down your dataset within the selected publications or databases using a keyword search. For detailed information on how to use Boolean operators, wildcards, and other search techniques click "Search tips" directly under the search bar on the right, or follow this link: 
  2. You can also filter your dataset by date range, source type and document type.
  3. As you apply filters, a sample of documents that match your criteria are displayed, giving you a preview of the data.
  4. When you are finished refining your search, click the Next: Review Dataset button on the bottom-right.
  5. The next page provides the document count of your dataset and the publications. If you want to keep modifying it, click on "Refine Content" under the checkbox on the progress bar at the top. 
  6. Give your dataset a name and description. It is a good idea to give the dataset a unique name and clear description of the sources, dates and search terms used, especially if you'll be creating multiple datasets throughout the research process.
  7. Click "Create Dataset" at the bottom.
  8. The dataset will begin processing and may take some time. TDM Studio processes 100,000-200,000 documents an hour.
  9. You will be returned to your dashboard and the dataset you just defined will display In-Progress.
  10. When a dataset is completed, it will transfer automatically to the data folder in your Jupyter Notebook environment.