Skip to main content

Data Curation

This website outlines the basics of data curation, gives pointers on best practices, and provides a portal into the services available through the Brown University Library.

Defining Data Curation

Data curation speaks to an increasing need, shared across disciplines, to identify and manage the datasets created by researchers and provide for their eventual reuse.  

In the last two decades academic research has become increasingly computational and data driven.  This shift has produced new disciplines such as computational biology and digital humanities.  It has also changed the role of academic libraries and librarians as partners in scholarly research and publication.   

Data curation merges computational methods into the traditional functions of a university library: selecting, preserving, cataloging, and sharing the products of scholarly and artistic pursuits.  

Funding requirements imposed by federal agencies, and the implications of the recent Executive Order on Open Access, focus on data management plans for research data in the life sciences (NIH) and physical sciences (NSF), and in the humanities (NEH-ODH).  

All research data, however, whether it is generated by a large scientific research lab, created by a digital humanities project, or collected by the library, benefits from good data curation practices, integrated throughout the research data lifecycle.

Research Data Lifecyle

Rather than collecting a final, published version of scholarly research, data curation intervenes in the research process itself to allow current and future researchers and educators to use and reuse the data for their own scholarship and teaching.  This lifecycle can be simplified into four basic stages:

plan a research question and a process for gathering the data necessary to answer it; plans vary in specificity, scale, and duration but if they will be creating data sets, they will benefit from data curation practices

create the raw data is generated (or gathered from preexisting sources)

analyze the data set is parsed, filtered, and run through the necessary analyses to answer the research question, in the case of digital art projects this may be supplemented (or replaced) by a performance or installation

share data is made available for use and reuse

Each research discipline has its own path through these stages, but here are some potential steps.





grant application

pilot project

class exercise


manually entered

gathered from instrumentation

combine existing datasets

statistical analysis



artistic performance/ installation

publish data sets

create websites

link to data sets in other publications


While the plan is often presented in as a straight, chronological line, these steps interweave creating a research data lifecycle closer to the one pictured below.



Data curation has its own set of processes which integrate into this larger cycle, providing structure and provenance for the data as it is created, continuity of data sets as they pass through various levels of analysis, and finally frameworks through with the data can be shared in intelligible and reusable forms.  These practices include ensuring and maintaining data quality, providing consistent organization of files and related material, ongoing documentation of the process by which it was derived and other metadata, assessing which data should be retained (and for how long), and a plan for storage and re-use, including the assignment of identifiers and stable links.


create             analyze


establish workflow

select data formats

create data structures

implement workflow

revise as necessary