Sequence databases
DNA and protein sequence databases are the fundamental resources for bioinformatics
research, development and application.
DNA sequence databases
- GenBank - The web
portal to the NIH genetic sequence database maintained by NCBI, also a
part of the International Nucleotide Database Collaboration. Literature
citation, release notes and an example record can be found in this page.
- EMBL - The web portal to EMBL
nucleotide sequence database maintained by EBI, also a part of the International
Nucleotide Database Collaboration. Various documentations such as release
notes, database statistics, user guide, feature table definition and sample
entry and FAQs are provided.
- RefSeq - The Reference
Sequence collection constructed by NCBI to provide a comprehensive, integrated,
non-redundant set of DNA, RNA sequences and protein products. It provides
a stable reference for genome annotation, gene identification and characterization,
mutation and polymorphism analysis, expression studies and comparative
analyses.
- UniGene
- An Organized View of the Transcriptome created by NCBI. Each UniGene
entry is a set of transcript sequences that appear to come from the same
transcription locus, together with information on protein similarities,
gene expression, cDNA clone reagents, and genomic location.
- dbSNP - The database
of single nucleotide polymorphism maintained by NCBI.
- EMBLCDS - a database of
nucleotide sequences of the coding sequence from EMBL.
Protein sequence databases
- Swiss-Prot - The entry site
for the well annotated protein sequence knowledge database maintained
by SIB at Geneva, Switzerland. A list of references, a comprehensive user
manual, and database statistics with tables and flowcharts are provided.
- UniProt - The main web site
for international protein sequence database which consists of the protein
knowledgebase (UniProtKB), the sequence clusters (UniRef) and the sequence
archive (UniParc).
- HPI - The
Human Proteomics Initiative, an EBI project to annotate all known human
sequences according to the quality standards of UniProtKB/Swiss-Prot.
It provides for each known protein, a wealth of information that include
the description of its function, its domain structure, subcellular location,
post-translational modifications, variants, similarities to other proteins,
etc.
- IPI - The web site
of the International Protein Index database which provides a top level
guide to the main databases that describe the proteomes of higher eukaryotic
organisms.
References
- EMBL/GenBank Division [PDF]
- EMBL/GenBank Format [PDF]
- EMBL/GenBank Feature Table [PDF]
- Swiss-Prot potein names [PDF]
About |
Notice
| 16 July 2008, J Luo, CBI, PKU, Beijing, China