Skip to main content

Cognitive, Linguistic & Psychological Sciences

Linguistic Corpora at Brown

Brown is a member of the Linguistic Data Consortium (LDC), an open consortium of universities, libraries, corporations and government research laboratories that create and distribute a wide array of language resources.

How do I find LDC corpora available at Brown?

In the library catalog, perform an Author search for Brown University Linguistic Corpora Collection (see link below).

How do I access the LDC corpora?

You must fill out the Language Corpora Access Request Form (see link below) to be granted access to corpora from the LDC or other institutions.

A link to this form is also present in the library catalog record for each corpus.

What happens once I fill out the form?

After your request has been reviewed by the Library and your advisor, if applicable, you will receive an email notifying you that access to the corpora has been granted.  This email will include instructions on how to access the corpora.  

Some corpora require the signing of a special user license agreement.  If the corpora that you have requested requires such a license, then you will receive an email with the license.  Once the signed license has been returned, you will receive email notification that access has been granted.

Once granted, your authorization may be restricted or limited, depending on your status, project, or other factors.

Corpora Basics

What are linguistic corpora?

Linguistic corpora are collections of data, either written texts or a transcription of recorded speech, selected according to external criteria to represent, as far as possible, a language or language variety. They provide a source of data for linguistic research. Many linguistic corpora are available electronically as machine-readable texts.  Using and manipulating these data require some knowledge of the programming of text files and the writing of codes.

General Resources for Finding & Using Corpora

Selected English-Language Corpora

Selected Non-English Corpora