This guide cannot capture the breadth and depth of available data out there, but is intended as a starting point. These are searchable collections of databases indexed by topic - they will be the most timely sources to check.
The Dataset Catalog is a catalog of biomedical datasets from various repositories for users to search, discover, retrieve, and connect with datasets to accelerate scientific research. This beta version aims to collect user feedback to inform future product development.
List of NIH-supported data repositories and resources that aggregate information about biomedical data. Each entry has a brief description of the repository and links to data submission and access policies.
The 2024 Nucleic Acids Research database issue contains 180 papers from across biology and neighbouring disciplines. There are 90 papers reporting on new databases and 83 updates from resources previously published in the Issue. Updates from databases most recently published elsewhere account for a further seven.
Partnership of people, institutions and government agencies that supports the conservation of birds and their habitats by improving access to and use of data and tools. Data available on bird-monitoring, banding and citizen-based bird-surveillance.
Single integrated species checklist and taxonomic hierarchy. The Catalogue holds essential information on the names, relationships and distributions of over 1.6 million species.
Provides free access to biological, physical and socioeconomic geospatial data and maps, along with tools to create custom visualizations, drawings and analyses.
Authoritative taxonomic information on plants, animals, fungi and microbes of North America and the world. Full database or specific taxonomic group data available for download.
International repository for ecological and environmental data. Data originate from field stations, laboratories, research sites and individual researchers around the world.
The Long Term Ecological Research (LTER) Network is a collaborative of researchers and graduate students who focus on long-term ecological processes at 26 LTER sites around the United States, Antarctica, and islands in the Caribbean and Pacific. The LTER Data Portal contains ecological data packages contributed by past and present LTER sites.
Provides collaborative tools for researchers to upload images and morphological data, and use that information to produce, edit, illustrate and annotate phylogenetic matrices. Also a repository for data associated with peer-reviewed publications.
ExPASY, the SIB Swiss Institute of Bioinformatics Resource Portal, provides access to databases and software tools in different areas of the life sciences, including proteomics, genomics and phylogeny.
The 2024 Nucleic Acids Research database issue contains 180 papers from across biology and neighbouring disciplines. There are 90 papers reporting on new databases and 83 updates from resources previously published in the Issue. Updates from databases most recently published elsewhere account for a further seven.
Collection of sequences from multiple sources, including GenBank, RefSeq, and Protein Data Bank (PDB). Searching Nucleotide will yield results from each of its component databases, which can also be searched separately. [NCBI database]
Universal Protein Resource (UniProt), a collaboration between the European Bioinformatics Institute, the SIB Swiss Institute of Bioinformatics and Protein Information Resource, provides high-quality, freely accessible protein sequence and functional information.
Archive and distribution center for results of studies that investigate the interaction of genotype and phenotype, including GWAS and molecular diagnostic assays. [NCBI database]
BMRB collects, annotates, archives, and disseminates spectral and quantitative data derived from NMR spectroscopic investigations of biological macromolecules and metabolites.
"WormBase is an international consortium of biologists and computer scientists providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes."
A database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana
The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana . Data available from TAIR includes the complete genome sequence along with gene structure, gene product information, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publications, and information about the Arabidopsis research community. Gene product function data is updated every week from the latest published research literature and community data submissions. TAIR also provides extensive linkouts from our data pages to other Arabidopsis resources.
Plant Comparative Genomics portal of the Department of Energy's Joint Genome Institute. Families of related genes representing the modern descendants of ancestral genes are constructed at key phylogenetic nodes. These families allow easy access to clade-specific orthology/paralogy relationships as well as insights into clade-specific novelties and expansions. As of release v11, Phytozome provides access to sixty-five sequenced and annotated green plant genomes.
Provides a broad network of plant metabolic pathway databases that contain curated information from the literature and computational analyses about the genes, enzymes, compounds, reactions, and pathways involved in primary and secondary metabolism in plants.
IPNI provides nomenclatural data (spelling, author, types and first place/date of publication) for the scientific names of vascular plants from family to infraspecific ranks.