Bioinformatics World    
         
 Main Index -> THE HUMAN GENOME
                -> Human Genome Databases              
                -> Human Genome Documentation              
                    
                          
Navigate    AtoZ   Search this Site   Site Journal    FAQ Index   Main Index   Appendix       
            
Human Genome Databases
NOTE: This page focuses mainly on genome browsers, whereas databases and tools which provide an "integrated view" on human genome data are described on the "Data Integration" page ! Currently, there are 4 major public human genome browsers: UCSC Genome Browser (UCSC), Ensembl (EBI+Sanger), Map Viewer (NCBI), and VEGA (Sanger).
CELERA Genomics
(Rockville, MD, USA)
CELERA Genomics is one of three business segments of Applera Corporation. NOTE: The Celera Discovery System has been a commercial online platform, which was terminated in June, 2005.
   
PERSONAL NOTE: The quality of the human genome data available via free public genome servers like UCSC Genome Browser or Ensembl is now very high, so there is no need to use commercial platforms anymore.
Ensembl Human Genome Database
(EBI + Sanger)

including:

MapView 

BioMart


1. Ensembl is a joint project between EMBL-EBI and the Sanger Centre to develop a software system which produces and maintains automatic annotation on eukaryotic genomes. The website includes chromosomal feature density plots, mouse trace homologies, CpG islands and tRNAs. Further features include curated gene structures from EMBL/Genbank/DDBJ submissions, links to unigene, ..... 
The database can be either searched using keywordsor using a sequence (BLAST).

2. You can also browse a chromosome with the MapView software. You can thereby search for domains, protein families, features, diseases, and many more !
Chromosomal regions can be quickly zoomed from 50 bp to 1 MB in detail. You can "walk" along the chromosome using the left and right buttons. By clicking the "features" button you can decide which ones to display or hide (like ESTs, UniGene, Repeat, ...). By clicking the "export" button you can download any user-defined sequence region.
NOTE: Clicking onto the button "Jump to UCSC" displays the same chromosomal region in the UCSC Human Genome Working Draft ! This is especially useful if you want to get a detailed summary of available ESTs for that sequence (in Ensembl just a view representative ESTs are shown !).

3. BioMart  is a powerful data retrieval tool that generates lists of biological objects (e.g. genes, SNPs) from data held in the Ensembl database.
Please refer to the corresponding section for details !

Typical Ensembl accessions: refer to section Ensembl IDs.
GDB - Human Genome Database
(RTI International, North Carolina, USA)
The Genome Database (GDB) is the official central repository for genomic mapping data resulting from the Human Genome Initiative. 
Although GDB has historically focussed on gene mapping, as the Genome Project moves from mapping to sequence to functional analysis, GDB's focus will be broadened. Extensions are under development in the representation of sequence - level genome content, including sequence variations, along with richer descriptions of function and phenotype.
GDB can be searched using identifiers, accession numbers, and keywords.
GeneCards
(Weizmann Institute)
GeneCards is a database of human genes, their products and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol, as well as selected others.
Please refer to the GeneCards main section for details.
GeneLynx  (Karolinska) GeneLynx is a portal to a collection of hyperlinks for each human gene.
Please refer to the GeneLynx main section for details.
HGNC - HUGO Gene Nomenclature Committee
(University College London)

including:
Genew - Human Gene Nomenclature Database

HUGO
(organisation)
The HGNC - HUGO Gene Nomenclature Committee approves a gene name and symbol (short-form abbreviation) for each known human gene. All approved symbols are stored in the HGNC database, which is also known under the name Genew. Each symbol is unique and each gene is only given one approved gene symbol. In preference each symbol maintains parallel construction in different members of a gene family and can also be used in other species, especially the mouse. The page Searchgenes allows to search the HGNC database via simple or advanced text searches.


HUGO - the Human Genome Organisation is the international organisation of scientists involved in human genetics. Established in 1989 by a collection of the world's leading human geneticists, the primary ethos of HUGO is to promote and sustain international collaboration in the field of human genetics. This website is designed to encourage you to find out more about the work of HUGO, its working committees and principles. 
H-InvDB  
(JBIRC + DDBJ) 
H-Invitational Database (H-InvDB) is a human gene database opened to the public in April 2004, which is hosted by the Japan Biological Information Research Center (JBIRC) and by the DNA Databank of Japan (DDBJ), with contributions from more than 40 institutes worldwide, like the german DKFZ. Please note that H-InvDB is based essentially on cDNA sequences; it is not a genome sequence repository although providing a graphical genome viewer (i.e. chromosome regions of matching cDNAs).
Please refer to the H-InvDB section at the Data Integration page for a detailed description !
Human Genome Resources
(NCBI)

including:

BLAST the Human Genome

Map Viewer

Entrez Gene

AceView



1. Human Genome Resources is a good site to start with, which contains links to all NCBI-based Web-resources related to the human genome. 

2. BLAST the Human Genome provides an interface to BLAST an accession number or fasta formatted sequence against the genomic Contig sequences, as well as the mRNAs and proteins derived from the Contigs. You will get the sequences of the genomic clones, as well as the map positions and links to the MapViewer, placing the clone into the zoomable genomic context (! down to the sequence level !).
NOTE: the human Genome Blast page searches datasets different from those used in Standard Nucleotide BLAST in non-redundant database (!), which presents annotated genomic as well as mRNA entries. 
NOTE: When you retrieve genomic contigs which just present the list of genomic accession numbers which were used to build the contig but NOT the sequence itself, you can do a Standard Nucleotide BLAST against the htgs (High throughput genome sequences) database to retrieve the individual genomic clones.

3. The NCBI Map Viewer presents a graphical view of the available Human Genome sequence data as well as cytogenetic, genetic, physical, and radiation hybrid maps. It displays the location of genes, STSs, FISH mapped clones, and variation on the contig sequence. You can find genes or markers of interest by submitting a query against the whole genome, or a chromosome at a time. Results are indicated both graphically, as tickmarks on the ideogram, and in a tabular format. The results table includes links to a chromosome graphical view where the gene or marker can be seen in the contextof additional data. Alternatively, you can browse a chromosome. 

4. Entrez Gene includes report pages for all genes defined by the genome annotation process. It provides a single query interface to curated sequence and descriptive information about genetic loci. It presents information on official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, MIM numbers, UniGene clusters, homology, map locations, and related web sites. It is a highly recommendable site to start if you look for information on a gene/protein of interest !
Please refer to this main section for details.

5. AceView offers an integrated view of the human (and other species) genes, as reconstructed by alignment of all publicly available mRNAs and ESTs on the genome sequence. Please refer to this main section for details.
MITOMAP
(NIH)
MITOMAP is a database covering information regarding the human mitochondrial genome. 
UCSC Genome Bioinformatics
(University of California, Santa Cruz) 

including:

UCSC Genome Browser Gateway

UCSC Table Browser

BLAT

UCSC Gene Sorter

UCSC Proteome Browser

UCSC VisiGene

UCSC In Silico PCR

ENCODE at UCSC
The UCSC Genome Bioinformatics site contains the reference sequence and working draft assemblies for a large collection of genomes. UCSC provides possibly the best graphical genome browser. It provides graphical and sequence output, including the position of ESTs, SNPs, Repeating Elements, Mouse synteny information, and much much more. Other tools at UCSC include the UCSC Gene Sorter which shows expression, homology and other information on groups of genes that can be related in many ways. BLAT quickly maps your sequence to the genome. The UCSC Table Browser provides convenient access to the underlying database. VisiGene lets you browse through a large collection of in situ mouse and frog images to examine expression patterns.

NOTE: Please refer to the UCSC Genome Browser User Guide for a detailed, yet concise description of the different databases and tools of the UCSC portal. Here, I would like to mention rather some global remarks and personal hints.

IMPORTANT: It is usually best to work with the most recent assembly ("freeze") even though a full set of tracks might not yet be ready. Be aware that the coordinates of a given feature on an unfinished chromosome may change from one assembly to the next as gaps are filled, artifactual duplications are reduced, and strand orientations are corrected. The Genome Browser offers multiple tools that can correctly convert coordinates between different assembly releases. For more information on conversion tools, see the section Converting data between assemblies of the User Guide.

NOTE: As the different tools and databases cover a range of topics, their detailed descriptions may be located elsewhere on "Bioinformatics World", but there is at least a short note included in this section here, which therefore may serve as a "starting point" for all UCSC-based resources.
 
1. UCSC Genome Browser Gateway:
You may query the UCSC Genome Browser by gene names, keywords, accession numbers, cytogenetic positions, author names (submitters), words like "zinc finger" or "huntington" and several other types of input. Genes are displayed in their genomic context within a graphical window the content of which is highly customizable.
NOTE: To facilitate your return to regions of interest within the Genome Browser, save the coordinate range or bookmark the page of displays that you plan to revisit or wish to share with others.
- Navigation buttons: there are several buttons to move left/right along the chromosome, to zoom in/out, and there is a field which allows the direct input of a genomic position.
- Track topics: there are certain groups of tracks which thematically belong together. Examples are "Mapping and Sequencing", "Gene Prediction", "Expression and Regulation", and "Variation and Repeats".
- Track Display options:
use either the "configure" button to select groups of tracks which shall be shown/not shown or use the drop down controls below the image window and press refresh to alter tracks displayed. Note that the display options range between "hide" and "full". Note that tracks with lots of items will automatically be displayed in more compact modes. Some tracks have additional filter and configuration capabilities, e.g. EST tracks, mRNA tracks, NC160, etc. These options let the user modify the color or restrict the data displayed within an annotation track. To access filter and configuration options for a specific annotation track, open the tracks' description page by clicking the label for the track's control menu under the Track Controls section or the mini-button to the left of the displayed track.
- Custom Annotation Tracks: The Genome Browser provides dozens of aligned annotation tracks that have been computed at UCSC or have been provided by outside collaborators. In addition to these standard tracks, it is also possible for users to upload their own annotations for temporary display in the browser. These custom annotation tracks are viewable only on the machine from which they were uploaded and are only kept for eight hours after the last time they were accessed. Please refer to the section Creating custom annotation tracks of the User Guide. Note that there is also a description of the allowed file formats included. Annotation data can be in standard GFF format or in a format designed specifically for the Human Genome Project or UCSC Genome Browser, including GTF, PSL, BED, or WIG.
- Sequence Download: If you want to download the DNA sequence that is displayed in the graphics window, follow the link "DNA" on top of the page. Formatting options range from simply displaying exons in upper case to elaborately marking up a sequence according to multiple track data. The DNA sequence covered by various tracks can be highlighted by case, underlining, bold or italic fonts, and color.
- Image Download: The Genome Browser provides a mechanism (PDF/PS link) for saving a copy of the currently displayed annotation tracks image to a file that can be printed or edited. Images saved in PostScript format can be printed at high resolution. This is useful for generating figures intended for publication. Images can also be saved in PDF format.
- Genome Data Download: Most of the underlying tables containing the genomic sequence and annotation data displayed in the Genome Browser can be downloaded. Please refer to the section Downloading Genome Data in the User Guide.


2. UCSC Table Browser:
The UCSC Table Browser provides a powerful and flexible graphical interface for querying and manipulating the Genome Browser annotation tables. The Table Browser  lets you retrieve the DNA sequence data or annotation data underlying Genome Browser tracks for the entire genome, a specified coordinate range, or a set of accessions. Because the Table Browser uses the same database as the Genome Browser, the two views are always consistent. The Table Browser, a portal to the underlying open source MySQL relational database driving the Genome Browser, displays genomic data as columns of text rather than as graphical tracks. Output can be filtered to restrict the fields and lines returned, and may be organized into one of several formats, including a simple tab-delimited file that can be loaded into a spreadsheet or database as well as advanced formats that may be uploaded into the Genome Browser as custom annotation tracks. The Table Browser provides a convenient alternative to downloading and manipulating the entire genome and its massive data tracks. 

For detailed information on using the Table Browser features, refer to the Table Browser User Guide. Hint: If you are new to the Table Browser, read the Getting started section to learn about browser basics and try some simple queries.

Quick guide:
2.1. Basic structure:
- Non-positional tables contain data not tied to genomic location, for example a table that correlates a Known Gene ID with a RefSeq accession ID.
- Positional tables contain data associated with specific locations in the genome, such as mRNA alignments, gene predictions, cross-species alignments, and other annotations.

2.2. Simple position-based query:
- Pick a genome assembly, annotation track, and a table.
- Pick a genomic region. By default, the Table Browser region is set to genome, which will display all the data records in the selected table. To restrict the data to a specific position range, type the position into the position box.
Note:
The data in non-positional tables are not tied to genomic coordinates; therefore, the region option is unavailable when a non-positional table is selected. A basic query on a non-positional table will show all the data in the table.
- Output:
Click the Get Output button to display the results of the query. By default, the Table Browser outputs the data from all fields in the selected table as tab-separated text on the screen. See the Output Formats section for information on configuring the query output.

Example: It is possible to retrieve a table containing all RefSeq genes of the complete human genome (region=genome). Alternatively, by selecting the output format "sequence", a FASTA sequence file of all RefSeqs is created.

2.3. Batch Query using identifiers:
In many cases, you may want to retrieve data based on a list of one or more accessions or names, rather than querying by genomic position. Many tracks in the Table Browser, such as those in the Genes and Gene Prediction track group, support identifier queries. The identifier type used in the query must match the kind of identifiers present in the track data, e.g. mRNA accession IDs must be used to query the mRNA table.
- Use the Paste List or Upload List buttons and use a correctly formatted file of identifiers. If you are loading multiple identifiers, entries must be separated by a space, tab, or line.
- Note: The Table Browser will retain the identifier list until you delete the information by clicking the Clear List button.
- See the Output Formats section for information on configuring the query output.
NOTE: Several test showed that, although very powerful and versatile, the overall performance does not reach the one achieved by using tools like BioMart for the matter of high-throughput data retrieval. As one example, it was not possible (at least not achieved in a reasonable time) to use datasets of gene names or Entrez Gene IDs as input for the Batch Query.

2.4. Table Browser Filter options:
Please refer to the Filter section at the User Guide.

2.5. Intersecting Data from multiple tables:
Please refer to the Intersection section at the User Guide.

2.6. Output Formats:
Please refer to the Output Formats section at the User Guide.


3. BLAT
BLAT uses query sequences to quickly find sequences of 95% and greater similarity of length 40 bases or more.
Please refer to the main section of BLAT for detailed information.

4. UCSC Gene Sorter:
The UCSC Gene Sorter is an excellent resource for exploring gene families and the relationships among genes. This tool displays a table of genes within a selected genome that are related to one another.
Please refer to the main section of UCSC Gene Sorter for details.

5. UCSC Proteome Browser Gateway:
The UCSC Proteome Browser Gateway provides a fast access to protein - specific data for a gene of interest.
Please refer to the main section of UCSC Proteome Browser Gateway for details.

6. UCSC VisiGene:
UCSC VisiGene is a browser for viewing in situ images. It enables the user to examine cell-by-cell as well as tissue-by-tissue expression patterns.
Please refer to the main section of UCSC VisiGene for details !

7. UCSC In Silico PCR:
UCSC In silico PCR searches a sequence database with a pair of PCR primers.
Please refer to the main section of UCSC In Silico PCR for details.

8. ENCODE at UCSC:
The UCSC Genome Bioinformatics Group manages the official repository of sequence-related data for the ENCODE consortium and supports the coordination of data submission, storage, retrieval, and visualization.
Please refer to the main section of ENCODE at UCSC for details.


9. Appendix: UCSC Genome Bioinformatics - mirror sites:
In case that the UCSC server is down, there are several mirror sites, like:
- UCSC mirror MCW: mirror at Medical College of Wisconsin, Milwaukee.
VEGA
(Sanger)

including:

VEGA Human Genome Browser
1. The VEGA (Vertebrate Genome Annotation) database is a central repository for high quality, frequently updated, manual annotation of vertebrate finished genome sequence. Details of the projects for each species are available through the homepages for human, mouse, zebrafish and dog. The website is built upon code from the Ensembl project.

2. The VEGA Human Genome Browser is the human section of the VEGA database. This site shows data on human chromosomes where a first-pass manual annotation has been completed. Currently (May 2005) there are fourteen chromosomes in the Vega human database, comprising nearly 50% of the genome. The Sitemap displays the different ways to access the Vega Human Database, which are quite similar to the Ensembl database. In general, the "look and feel" of the data presented by the Vega databases is similar to the Ensembl pages.

Some of the additional features of VEGA are:
- manual annotation of transcripts, also those having no ORF associated.
- manual identification of polyadenylation sites and signals.
- SNPs, mapped from the Glovar database.

Typical VEGA accessions: refer to section VEGA IDs.

Note: The VEGA database is one of the 4 databases linked to CCDS database reports. Please refer to the CCDS main section for information !
   

Human Genome Documentation
NOTE: This section lists outstanding publications and special issues of journals dedicated to the analysis of the human genome.
NATURE Human Genome Collection
(Nature Publishing Group)
NATURE Human Genome Collection is a supplement to the June 1, 2006 issue of Nature, collecting milestone articles of the road to the discovery of the human genome.  In 1991, work began sequencing the 2.85 billion nucleotides of the human genome. Nature presents the complete and comprehensive DNA sequence of the human genome, including detailed analysis of all chromosomes, as a freely available resource. The Human Genome Collection contains specially commissioned commentaries, streaming video that features interviews with scientists behind this momentous project, and details of how to obtain your print copy.

NOTE: This printed supplement is composed of
- several new commentaries (pages 9-17)
- all other papers have previously been published in Nature (pages 18-305), mainly in the years 2001 to 2006.

NOTE: The NATURE Issue of February 15, 2001 contains the initial "finished draft sequence" papers of the human genome.

NOTE: Both issues (supplement of June 1, 2006 and human genome issue of Feb. 15, 2001) are available for free. Note that the June 1, 2006 supplement is not pdf-based but is viewed using NewsStand. NewsStand Digital Delivery gives you the ability to read newspapers and magazines on your computer. All content of the publication (articles, images and ads) are displayed exactly as they appear in the printed version. There are currently two ways to receive the digital edition, selected by the publisher - Classic NewsStand Reader and iBrowse.Read the descriptions at this NewsStand-"How it works" page for more information on each service.
NOTE: You can access your previously viewed NewsStand articles via the button "My NewsStand" !