databases

Bioinformatics databases

There are thousands online bioinformatics databases available on the Internet. The best way to find a database that you are interested in is to look through the Database Common at the Beijing Institute of Genomics, Chinese Academy of Sciences.

Guide for databases

Database Commons - A catalog of more than 5000 biological databases collected from literatures.
NAR Database Issue - The Journal Nucleic Acids Research publishes a Database Issue on the 1st January each year.
JBDC - The online Open Access Journal of Biological Databases and Curation.

Sequence databases

UniProt - The main web site for international protein sequence database which consists of the protein knowledgebase (UniProtKB), the sequence clusters (UniRef) and the sequence archive (UniParc).
RefSeq - The Reference Sequence collection constructed by NCBI to provide a comprehensive, integrated, non-redundant set of DNA, RNA sequences and protein products.
GenBank - The web portal to the NIH genetic sequence database maintained by NCBI, also a part of the International Nucleotide Database Collaboration. Literature citation, release notes and an example record can be found in this page.
ENA - The web portal to nucleotide sequence database maintained by EBI, also a part of the International Nucleotide Database Collaboration. Various documentations such as release notes, database statistics, user guide, feature table definition and sample entry and FAQs are provided.
DDBJ - The web portal to nucleotide sequence database in Japan.
GSA - A data repository for omics data maintained by the Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS), serving as a primary archive of genome sequencing data.

Protein structure

PDBj - The protein structure database at Japan.
RCSB - The main repository of macromolecular structures maintained by the Research Collaboration for Structural Bioinformatics.
PDBe - The entry point for the EBI macromolecular structure database.
PDBSum - The PDB summary database maintained by EBI.
MMDB - The macromolecular database maintained by NCBI.
SCOP - The database of Structure Classification of Proteins developed and maintained by Cambridge University.
CATH - The database of Calcification, Architecture, Topology and Homologous superfamily developed and maintained by University College, London.
MolviZs - Atlas of Mcromolecules including ptotein, DNA, RNA.

Databases of protein domain, function and expression

InterPro - Classification of protein families maintained at the EBI.
CDD - A database of conserved protein domains created and maintained by the NCBI structure group.
Expression Atlas - EBI open resource that gives users a powerful way to find information about gene and protein expression.
HPA - A web site for the the human protein atlas which shows expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines with the aid of immunohistochemistry images, developed and maintained y Proteome Resource Center, Sweden.

Family databases

PFam - The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
RFam - The Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs).
DFam - The Dfam database is a collection of Repetitive DNA element sequence alignments, hidden Markov models (HMMs) and matches lists for complete Eukaryote genomes.
TreeFam - A database composed of phylogenetic trees inferred from animal genomes. It provides orthology/paralogy predictions as well the evolutionary history of genes.

Interaction and Pathway databases

STRING - A web server for protein-protein interaction at the EMBL.
STITCH - A web server for chemical-protein interaction at the EMBL.
REACTOME - An open-source, open access, manually curated and peer-reviewed pathway database.
Plant REACTOME - The Plant REACTOME at Gramene.
KEGG - Kyoto Encyclopedia of Genes and Genomes.

Genome databases and genome browsers

ENSEMBL - The web server of the European eukaryotic genome resource developed by EBI and the Sanger Institute.
UCSC Genome Information - The genome browser website containing the reference sequence and working draft assemblies for a large collection of genomes at the University of California at Santa Cruz (UCSC).
Phytozome - A tool for green plant comparative genomics, maintained by the Center for Integrative Genomics, Joint Genome Institute.
Gramene - A curated open-source data resource for plant genome analysis.
NCBI Genome Data Viewer - A genome browser supporting the exploration and analysis of eukaryotic RefSeq genome assemblies.
NCBI Genome - The entry portal to various NCBI genomic biology tools and resources.
VISTA - A comprehensive suite of programs and databases for comparative analysis of genomic sequences.
GOLD - Genomes Online Database, a comprehensive information resource for complete and ongoing genome sequencing projects with flowcharts and tables of statistical data.

Database of Model Organism

MGI - The international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.
XenBase - The Aferican clawed frog Xenopus laevis and Xenopus tropicalis biology and genomics resource.
ZFIN - The Zebrafish International Resource Center.
Flybase - A comprehensive database of drosophila genes and genomes maintained by Indiana University.
WormBase - The biology and genome resource of the Caenorhabditis elegans genome.
SGD - The Saccharomyces Genome database.

Animal Databases

AnimalTFDB - Animal Transcription Factor Database at Huazhong University of Science and Technology.

Plant Databases

PlantTFDB - The database of plant transcription factors built by the Center for Bioinformatics, Peking University.
TAIR - The Arabidopsis information resource maintained by Stanford University. It includes the complete genome sequence along with gene structure, gene product information, metabolism, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publications, and information about the Arabidopsis research community.
AraPort - Araport is a one-stop-shop for Arabidopsis thaliana genomics. Araport offers gene and protein reports with orthology, expression, interactions and the latest annotation, plus analysis tools, community apps, and web services. Araport is 100% free and open-source. Registered members can save their analysis, publish science apps, and post announcements.
Oryzabase - A comprehensive rice science database maintained by National Institute of Genetics, Japan. It contains genetic resource stock information, gene dictionary, chromosome maps, mutant images and fundamental knowledge of rice science.
Wheat Genome - The Onternational bread wheat genome database maintained by the French National Institute of Agriculture, Food and Environment.
WheatIS - The Wheat Information System maintained by the French National Institute of Agriculture, Food and Environment.
Wheat Omics - The Wheat omics data and tools created by several Chinese research groups.
MaizeDB - The community database for biological information about the crop plant Zea mays ssp. mays, with genetic, genomic, sequence, gene product, functional characterization, literature reference.
SGN - A collection of data resource of the Solanaceae species including tomato, potato, pepper, eggplant, petunia, nicotiana.
ICuGI - The web portal for the International Cucurbit Genomics Initiative including melon, cucumber, watermelon, pumpkin, etc.
GDR - The genome database for Rosaceae, including apple, pear, peach, apricot, strawberry, rose, etc.

Fungal Genome Databases

MycoCosm - JGI Fungal Genomics Resource.
DFVF - A database of virulence factors in fungal pathogenes.

Bacterial Genome Databases

PATRIC - the Bacterial Bioinformatics Resource Center, an information system designed to support the biomedical research community work on bacterial infectious diseases via integration of vital pathogen information with rich data and analysis tools.

Virus Genome Databases

Viral Genomes - the main page of NCBI viral genome information resource.
NCBI Flu - NCBI Influenza Virus Resource with influenza genomic data and analysis tools.
Plant Viruses - This site provides a central source of information about viruses, viroids and satellites of plants, fungi and protozoa.