Skip to main content

Data Management

Naming and Organizing Files

This page is designed to help you:

  •     Create meaningful and human-readable file names
  •     Sort files by their creation date, number in a sequence, or version
  •     Create file names with standard characters to prevent computational errors

Below you'll find a series of common scenarios and best practices for addressing each scenario.

Creating File Names

Scenario

You are using a new laboratory instrument to analyze a sample. Its software outputs a data file with a name that is a very long string of random-appearing alphanumeric characters. You have saved the files in a folder but they are difficult to browse, and some will not always open on your laptop, so you have to use another computer in the lab.

Best Practice

This random-appearing string is more meaningful to the machine than to the humans conducting the experiment. Further, these long file names can potentially cause errors when opening the file using different types or versions of software or in a different operating system environment. The best practice is to:

  • Rename the files so they are meaningful to you. For example, use your initials, date, project abbreviation or ID, experiment method, experiment ID, or sequence number, etc.).
  • Create “human readable” file names. That is, create file names that will be easy to read and understand (without having to expand the window). A good rule of thumb for human-readable file names is to limit them to around 35-40 characters.

Example

20190102_ac_smithlab_utra_exp01_gel_003

This file name contains information meaningful to the creator and team and includes: date ("2019-01-02"), creator initials ("ac"), project ID ("smithlab_utra" for Smith Lab Undergraduate Teaching and Research Award), experiment methodology ("gel" for gel electrophoresis), experiment ID ("exp01"), and the number in the sequence of files generated during the experiment ("003").

Avoiding Non-Standard Characters

Scenario

In the same way that a very long file name may potentially cause problems, so too can the use of non-standard symbols and spaces in a file name. In some cases, non-standard symbols can have specific meaning to the software or system that is unknown to the user. Similarly, some software or operating systems may not always open a file if it views spaces in the file name as an error.

Best Practice

  • Use ASCII (American Standard Code for Information Interchange) alphanumeric characters.
  • Avoid spaces between the elements comprising a file name and instead use standard symbols such as underscores ( _ ) or hyphen ( - ) to connect the different elements.

Examples:

  •  20190501_exp123_analysis_version01.pdf (underscore is used)
  •  20190501-exp123-analysis-version01.pdf (hyphen is used)
Use underscores or dashes to connect the elements comprising a file name.

Sorting Files

Scenario

You and your team members name files inconsistently. This prevents you from being able to sort all the files in the directory in a logical manner, such as chronologically by date or numeric sequence in the ascending or descending order they were created, or by the version number.

Best Practice

Enable the logical sorting of files by adding the creation date, a sequence number, or a version number to the file name.

Examples

Add Dates

Date formats can differ in various parts of the world (e.g., 04-05-05 can be either May 5th or April 5th depending on which country you are in), so a best practice to avoid any confusion is to follow the format endorsed by the International Organization for Standardization (ISO), which is in the order of year-month-day (YYYYMMDD). April 5, 2005 would be written as 20050504. 

Add Sequence Number
  • 20200801_project123_exp123_run01.csv

The best practice for adding a sequence number is starting with ‘01’ for one instead of ‘1’ so that a computer recognizes the place value and does not interpret one as ten. For example, for tens of files use 01-99, for hundreds use 001-999, and thousands of files use 0001-9999, and so on. Some researchers like to include a descriptive file in the directory that provides information about the files in that directory called a README file. It will appear at the top by assigning it “00,” e.g., 00_README.

Add a Version Number
  • 20200801_manuscript_version_26.pdf

If you plan on having multiple drafts of a file, you can add a version number. This way you can sort the files and recognize the last version saved when you completed the document (and avoid having several files with "final" in the name).

Selecting File Formats

Scenario

Technology can change quickly. Software can have many versions over its lifetime with small (v1.1) or large updates (v2) to repair a bug or add a new feature. In some cases, if you have not updated the software in a while it can lead to issues when trying to open or work with files created using earlier versions. You may have created the file with software for which you or your school purchased a license, but you may have let the license expire; or you do not have the program on your own computer and are unable to open the file.

Best Practice

There are a few best practices for selecting file formats to use for long-term storage and sharing of data.

  • Save a copy of the original version of a file in its original format. This step can help to prevent loss of data in case the file is ever exported into another format that reduces the size or quality of the file to make it easier for storing or sharing.
  • Export a copy of the data into a different common file format. The best option is selecting a non-proprietary format often referred to as an "open format" because it can be opened by several free and open source software programs. These are examples of common and open file formats:
    • Text file (txt) - for documents
    • Delimited text files, called comma separated values files (csv) - for spreadsheets
    • MPEG-4 - an open container for multimedia
  • Document proprietary software formats that cannot be easily exported into an open file format.
    • Document as much information in a text file labeled README in the directory. This could help another person locate a copy of the software.
    • Include information such as the instrument manufacturer, instrument model names, serial number, copyright year, and the exact version number of the software used to generate the files.

Further Reading

Strasser, Carly (2015). Research Data Management: A Primer. National Information Standards Organization (NISO). https://www.niso.org/publications/primer-research-data-management.

Learning Outcomes

This page was designed to help you:

  • Create meaningful and human-readable file names
  • Sort files by their creation date, number in a sequence, or version
  • Create file names with standard characters to prevent computational errors 

Did this page help you meet these objectives?
yes: 1 votes (100%)
no: 0 votes (0%)
Total Votes: 1