Skip to Main Content

Data Management

Documenting Methods and Describing Data

This page is designed to help you:

  • Document protocols and methods used to collect and analyze data to enable replication of your methods
  • Provide contextual information for data to aid discovery, access, citation, and reuse of your data
  • Use international standards for uniform description of data

Below you'll find a series of example scenarios and best practices for addressing each scenario.

Describing Methods

Scenario

A year after you complete your summer project and publish your results, you get an email from a student working on a new phase of the project with a question about your study design. You are having difficulty remembering because it has been a year since you last thought about the project.

Best Practice

  • Keep a notebooks. Regardless of discipline — humanities, sciences, social sciences — a best practice is to keep a paper or digital research notebook. It is common for researchers, even after years have passed since their projects were completed and their papers published, to receive an inquiry from an interested colleague. Keeping detailed notes can help you record decisions made about design and methods and information about individual samples and experiments.
    • Digital notebook: Electronic notebooks have many advantages, including the ability to easily create copies and search for keywords within the text. You can also scan paper notebooks to have a digital backup PDF.
    • Paper notebooks in the lab: In the sciences, paper research notebooks stay in the lab.
  • Ask for a copy. All lab notebooks, paper and digital, belong to the research institution. Therefore, it is important to ask the Principal Investigator (PI) for a copy of your paper or digital lab notebook at the end of a project so that the original can stay with the lab. This also helps to preserve continuity in case a new student takes over a project.

Keeping a Notebook

What to record in your electronic or paper notebook depends on the type of research project you are working on.

For example, a field notebook for an archeologist will look different from a lab notebook for a chemist working at the bench. It is good practice to check with an adviser, Principal Investigator (PI), or colleague for recommendations on the minimal information necessary to document your methods and the data you collect and analyze for your project. A good rule of thumb is to be as detailed as possible so that someone else can understand what you did and replicate your procedures.

In Practice

Date each entry. Since date formats can differ in various parts of the world, a best practice is to follow the one endorsed by the International Organization for Standardization (ISO) and write the date in the order of year-month-day (YYYYMMDD) to help avoid any confusion

Record protocols. Record all research questions and procedures (the who, what, when, where, why, and how) that you are following for the design and implementation of an experiment or data collection and steps taken to analyze data.

Assign each experiment an identifier number. Having an ID helps connect digital file names with entries in a notebook. (This will also help with paper notebooks if an experiment has a few days in between and continues on another page.)

Record observations and notes. Record thoughts, ideas, and descriptive information about a specimen or samples, location, time of day, the settings of an instrument when data were analyzed, etc.

Include images. (With paper notebooks you can print images and tape them in.)

Document versions of software used and any links to associated data and analysis scripts.

 

Using Standards

Scenario

You would like to use digital data from a colleague at another university who has collected data in a separate study. When it comes time to combine your respective datasets, you notice that you used different terminology and data structure. It will take some time to reconcile the differences.

Best practice

  • Use International Metadata Standards. Researchers in some disciplines have come together to agree upon and publish a guide for their research community to use to uniformly describe data so that they are providing the same types of information in a standard way when sharing and publishing data. For example, Darwin Core is an international standard used by biologists for structuring phylogenetic information about an organism.
  • Use controlled vocabularies. Some research communities have developed a controlled vocabulary, i.e., a database of terms that allow researchers to use standard language to describe shared objects of study. For example, the National Cancer Institute (NCI) has a thesaurus of terms for describing different cancers and their biology, physiology, biochemistry, and medical treatment. These standards and controlled vocabularies help researchers to communicate effectively and make data inter operable.

Using Minimal Information and Reporting Standards

Scenario

You are submitting a manuscript to a journal that requires you to deposit the data underlying your reported results in a specific online database, or "repository." The repository has required fields for details about your samples and the settings of the instrument at the time the samples were analyzed that you did not record. Now you have to spend several hours rerunning the experiments to collect the missing metadata.

Best practice

Use Reporting Standards. Various research communities have come together to agree on the minimal information necessary to describe an experiment or method of data collection in their field. These “minimal information standards” help to ensure that enough context is provided that can help other researchers evaluate the data as well as potentially reuse or repurpose the data in another project.

Some lab instruments generate metadata describing the setting and analysis as separate machine-readable files, so it is good practice to ensure these are being output to a folder in case you need to reference them.

Creating Protocols, Codebooks, and READMEs

Scenario

A colleague has published their study in an area that you also actively research. You ask the colleague to send you a copy of their data so you can learn more about the study and its findings, as well as a copy of the research software they developed so you can use it on a future study. Your colleague sends you the final data in a spreadsheet, but you have no idea what the variable abbreviations in the column headings mean, what standards of measurement were used, or what steps were taken to come up with the results. Similarly, the software copy is just the scripts without any documentation on how to install and run.

Best practice

  • Document and share protocols. Protocols are detailed methods for experimental/study design and implementation. In some fields they may contain any instruments used for data collection, such as a survey or structured interview.
  • Create and share codebooks. Codebooks provide detailed context for any analyses and resulting data, both qualitative and quantitative. They are essential for guiding a potential user through the decisions made, tools used, and steps of analysis. They contain the list of variable names and labels that explain and describe the variable and codes, values, and their labels, and they explain any missing values.
  • Create and share README files. README files provide documentation for analysis scripts and explain who wrote the code, the version number, a license that details who holds the rights and the permitted terms of use, comments about the code, and installation instructions that describe any and all dependencies (i.e., any other programs needed to build the environment required to successfully install the scoftware as well as any commands necessary to run the program).

Sharing & Citing Data

Scenario

You read an article describing a study published in a journal, and you want to know more about the authors’ methods and data. Unfortunately, the authors did not cite their data or state where they had made the files available online. You email one of the authors, but weeks go by without a response.

Best practice

Deposit and cite data in a repository. Historically, researchers have communicated by publishing their findings in a journal. Yet, data and code are essential parts of understanding and evaluating a study. Over the last decade, researchers have shifted to depositing their data, code, and documentation in online repositories for others to access and citing the location of these files in their articles. Other researchers can access the data in the repository to use the data or code, recreate some or all of the study, and cite the original study in their own published findings. 

Example

Nathan, Erica, 2020, "Data Tables: Effects of Dissolved Gases on the Freezing Dynamics of Ocean Worlds", https://doi.org/10.7910/DVN/YREBIV, Harvard Dataverse, V2

This example data citation contains the name of creator, year of creation, the title of dataset, a digital object identifier that resolves to its location online, the name of the repository where it was deposited, and the version number.

Licensing Data

Scenario

You find some files of data deposited in a repository by colleagues, and you decide that you would like to use some of their data in an upcoming project. You email them for permission but hear nothing back. You do not want to use their data without their permission. Eventually you give up and have to leave out data that would have potentially improved the project.

Best practice

Assign a license. Planning for potential reuse of your work by others requires you and your co-creators to provide information along with your files so that others know who holds the rights to the data and whom to contact for permission to reuse the data and code. The best practice is to assign or provide a license that explains any permissions you automatically grant to others so they know the acceptable terms for reusing your data and code.

Learning Outcomes

This page was designed to help you:

  •  Document protocols and methods used to collect and analyze data to enable replication of your methods
  •  Provide contextual information for data to aid discovery, access, citation, and reuse of your data
  •  Use international standards for uniform description of data
Did this page help you meet these objectives?
yes: 2 votes (14.29%)
no: 12 votes (85.71%)
Total Votes: 14