Voyant is a web-based text reading and analysis environment. It has sample corpora and you can upload your own collection in a variety of formats, including plain text, HTML, XML, PDF, RTF, and MS Word.
To try one sample for the purposes of this guide, we'll use the University of North Carolina library's North American Slave Narrative Collection.
1. Go to Voyant Tools (https://voyant-tools.org/).
2. Upload your corpus.
Navigate to the location you’ve saved the North American Slave Narrative Collection. For many of you, this will be your desktop. Go to na-slave-narratives > data > texts. Then select all texts. For many operating systems, you can select all texts by clicking Ctrl+A (or clicking ⌘+A on a Mac). For Windows 8 or 10, click “Edit” in the menu bar at the top of the window, and click “Select All” on the drop-down menu. Click “Open” on in the lower right-hand corner.
3. Analyze the visualizations.
You should see three visualizations on your screen.
When you click on a word in the Cirrus word cloud, you'll then see that the graph to your right hand side changes to that specific word. You'll be able to see each individual text file in the "Corpus (Documents)" section and find the relative frequency of the word you've chosen in each of the documents. You'll see a keyword in context view on the bottom left hand size that tells you the words that come before and after your query. The reader view in the middle of your page offers a view of the full text of a given document.
You’ll see 7 tabs across the top:
Concordance: This will show you what’s known as a Keyword in Context view (abbreviated KWIC, more on this in a minute), using the search bar below it.
Concordance Plot: This will show you a very simple visualization of your KWIC search, where each instance will be represented as a little black line from beginning to end of each file containing the search term.
File View: This will show you a full file view for larger context of a result.
Clusters: This view shows you words which very frequently appear together.
Collocates: Clusters show us words which _definitely _appear together in a corpus; collocates show words which are statistically likely to appear together.
Word list: All the words in your corpus.
Keyword List: This will show comparisons between two corpora.
Let’s get started.
Load files. To load one file for viewing, click “Open File.” To load a corpora of files, click “Open Dir.” For our purposes, click “Open Dir”
Navigate to our corpora on your desktop. After clicking the proper file, you should see the files loading into Antconc.
Search. In the search box, type the word “apple” to see how many times “apple” appears in the corpus and what words exist around it. Click “Start” when you’re ready to see this.
If you want to search for the singular and plural version of a word, such as “women” and “woman,” Antconc has “Wildcard settings” that allow for this.
Try typing wom?n into the search box.
Try typing m?n into the search box.
Why are there so many more instances of men than women? Take a look at the Concordance Plot Tool tab to see where results appear in target texts. Hover over one of the instances, and a hand will appear. Click the result to see how the “File View” or the word or phrase in context. Click on the “Clusters/N-Grams” tab and search for “wom?n” again. You’ll see each instance of the word “women” or “woman” in the context of the text.
1. Click on the tab for "Tool Preferences"
2. In the window that opens, click on the left Category sidebar on "Word List"
3. Check the button for "Use a stoplist below"
5. Click "Apply"
Topic modeling is a good way to explore different topics within a large corpus. If, for example, you want to explore how words relate to each other, topic modeling is a good way to explore groups of words in a document. MALLET is the most commonly used and well respected resource for topic modeling. For an excellent step-by-step guide on how to install MALLET, please see this Programming Historian tutorial. Please schedule an appointment with me if you'd like help installing and using MALLET.
Boyer, Ryan Culp Boyer. “A Human-in-the-Loop Methodology For Applying Topic Models to Identify Systems Thinking and to Perform Systems Analysis.” Masters thesis. University of Virginia, 2016. https://libra2.lib.virginia.edu/downloads/r207tp34z?filename=Boyer_Thesis_Dec2016.pdf.
Chang, Jonathan, Sean Gerrish, Chong Wang, Jordan L. Boyd-graber, and David M. Blei. “Reading Tea Leaves: How Humans Interpret Topic Models.” In Advances in Neural Information Processing Systems 22, edited by Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, 288–296. Cxurran Associates, Inc., 2009. http://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models.pdf.
Chuang, Jason, Sonal Gupta, Christopher D. Manning, Jeffrey Heer. “Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment.” International Conference on Machine Learning (ICML), 2013. http://vis.stanford.edu/papers/topic-model-diagnostics
Evans, Michael S. “A Computational Approach to Qualitative Analysis in Large Textual Datasets.” PLOS ONE 9, no. 2 (February 3, 2014): e87908. https://doi.org/10.1371/journal.pone.0087908.
Posner, Miriam. “Very Basic Strategies for Interpreting Results from the Topic Modeling Tool.” Miriam Posner’s Blog (blog), October 29, 2012. http://miriamposner.com/blog/very-basic-strategies-for-interpreting-results-from-the-topic-modeling-tool/.
Veas, Edurardo, and Cecilia di Sciascio. “Interactive Topic Analysis with Visual Analytics and Recommender Systems.” Association for the Advancement of Artificial Intelligence, 2015. https://www.researchgate.net/publication/279285547_Interactive_Topic_Analysis_with_Visual_Analytics_and_Recommender_Systems.