Bioinformatics World FAQ Center
  FAQ Index -> EXPRESSION
                -> EXP1...know the best access to expression data for a gene of interest ? (last update Mar. 3, 2006)
                -> EXP2...know which available microarrays contain a gene or a whole gene set of interest ? (last update Aug. 30, 2005)
                -> EXP3...know which published microarray experiments contain data of my gene of interest ? (last update Aug. 30, 2005)
                -> EXP4...identify genes with expression patterns similar to my gene of interest ? (last update Jun. 1, 2004)
                -> EXP5...query microarray data not by gene name but by the "nature of the experiment" ? (last update Nov. 11, 2004)
                -> EXP6...perform in silico expression profiling in the field of endothelial cell biology ? (last update Jun. 14, 2005) 
                -> EXP7...perform clustering analyses of microarray data ? (last update Jun. 18, 2006)  
                -> EXP8...get all proteins from endothelial cells involved in inflammation (SRS and GO approaches) ? -> see RET2  
                -> EXP9...submit my own microarray data to a public database ? (last update Jan. 3, 2006)    
                -> EXP10...compare the expression of my gene of interest in B-cells, T-cells, monocytes, and dendritic cells ? (last update Aug. 20, 2004)
                -> EXP11...get a "virtual multiple tissue Northern blot" of my gene of interest ? (last update Mar. 3, 2006)
                -> EXP12...know which genes are expressed in fetal brain 6 times higher than in adult brain ? (last update Aug. 20, 2004)
                -> EXP13...compare two different microarray platforms and see which genes are represented on both of them ? (last update Aug. 31, 2005)
                -> EXP14...compare a list of upregulated genes from one microarray experiment with one or more other experiments ? (last update Aug. 31, 2005)
                -> EXP15...know if alternative transcripts of a gene of interest are expressed in different tissues ? (last update Sep. 16, 2005)
                -> EXP16...generate contig sequences from a set of ESTs while considering alternative splicing ? (last update Sep. 16, 2005)
                -> EXP17...analyze the expression of a gene set of interest in cancer tissues ? -> see GENOM9 !
                -> EXP18...determine the expression profiles of normal vs. cancer tissues ? -> see GENOM10 !
                -> EXP19...determine the reliability of individual SAGE tags for expression analysis of a specific gene ? (last update Feb. 10, 2006)
                -> EXP20...get in situ microscopy images of RNA tissue localization and expression intensity ? (last update Mar. 2, 2006)
                -> EXP21...get RT-PCR, Northern Blot, and Western Blot expression data ? (last update Mar. 3, 2006)
     
    
Navigate   AtoZ   Search this Site   Site Journal    FAQ Index   Main Index   Appendix   
      

            
EXP1...know the best access to expression data for a gene of interest ? (last update
Mar. 3, 2006)

    This question is intended as a kind of "quickview" of some of the topics treated in the following sections. There is a wide range of categories and therefore also databases concerning the storage and access of gene expression data. Main categories are microarray data, SAGE data, and EST data. Note that many of these resources are described in more detail in FAQ EXP11 ("virtual multiple tissue Northern blot").
    GENERAL REMARK: Often, individual microarray probesets and also EST sequences correspond to only ONE specific splicing variant of a gene. Thus, in cases where different splicing variants may also differ in tissue-specific expression, it is necessary to consider this point. Please refer to FAQ EXP15 for this purpose !

1. Microarray data:

The access to public microarray data is described in detail in question EXP3. A general concern is the fact that there is no "unified" microarray database (yet). Therefore, all repositories have to be searched individually. In brief, I want to mention the following databases.
     SOURCE of the SMD-Stanford Microarray Database is very user friendly database for storage of raw and normalized microarray experimental data. SOURCE has an easy query method which allows to use GB IDs, LocusLink IDs, UniGene ID, or gene names. When available, there is a link to published microarray expression data, including data on the gene of interest. The visualized data are very easy to interprete: RED means UP and GREEN means DOWNregulated. Please note that by klicking onto the "red and green expression bar" of a single gene you can retrieve all other genes with similar regulation; in this list you may again klick on every gene and retrieve respective genes with similar expression !!! Note that the link "Authors' webpage" offers direct access to the primary databases holding the expression data. Please note that not all microarray data stored in the SMD (Stanford Microarray Database) are retrievable via SOURCE, therefore you may also directly search SMD (basic or advanced) for datasets via lists of specific experimental setups. Please refer to the SMD description at the main page for details.
    GEO (Gene Expression Omnibus) was launched by the NCBI, in order to support the public use of gene expression data. GEO is a gene expression and hybridization array data repository, as well as an online resource for the retrieval of gene expression data from any organism or artificial source. For a full description of the GEO functionality, please refer to the corresponding chapter at the main page. In brief, there are several options to query and browse the GEO database. If you want to search for expression data of a gene of interest, you can use one of the following options. Start with a "total ENTREZ search", which also includes GEO expression profiles in the output. Directly start at ENTREZ-Geo Expressions: You can simply enter your search term(s) like gene name, organism, tissue. Start at the GEO home-page, and enter all or part of the gene name, or gene symbol, into the "Query", "Gene Profiles" field. Note that also a gene-specific entry in Entrez Gene contains direct links to the GEO expression data of a gene of interest !
    CleanEx, provided by the Swiss Institute of Bioinformatics (SIB), is a database which provides access to public gene expression data via unique approved gene symbols and which represents heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-dataset comparisons. So far, CleanEx contains only human genes for which the symbol is approved by the HUGO nomenclature committee. There is one entry per gene name. Thus, CleanEx is NOT a repository of expression data in the strict sense but it collects expression data from several resources (GEO, ArrayExpress, SMD, etc.) in a gene-centered way in order to make it available via one common interface.

2. SAGE data:

    SOURCE does not only provide links to array data concerning a gene of interest, but also to SAGE (Serial Analysis of Gene Expression) data stored at the NCBI SAGE database. SOURCE has an easy query method which allows to use GB IDs, LocusLink IDs, UniGene ID, or gene names. When available, there is a link to SAGE data via "Go to Gene-to-tag mapping at NCBI".
    GEO (Gene Expression Omnibus) is not only designed to retrieve microarray data, but also SAGE  data. If you want to search for SAGE expression data of a gene of interest, you can follow the same instructions as for array data retrieval.
    SAGEmap is the "primary tool" at the NCBI to query SAGE expression data.You can query by tag, sequence, gene, library and more. Please, also refer to the "SAGE and ESTs" chapter at the main page for detailed information.
    NCBI Entrez Gene is another good starting point to retrieve various kinds of information, also gene expression data. Following the UniGene link of an entry, you will again find a link to SAGE data via "SAGE: Gene-to-tag mapping".
    CleanEx (see above) can also be used to retrieve SAGE data corresponding to a specific gene of interest.
    ECgene (gene prediction by EST clustering) predicts genes by combining genome-based EST clustering and a transcript assembly procedure. There is a specific section of ECgene called ECexpression, which is the expression data viewer of ECgene. ECexpression utilizes the extensive expression data from EST and SAGE sources.
    SAGE Genie, which is part of the CGAP portal, provides highly intuitive, visual displays of human and mouse gene expression. Moreover, SAGE Genie provides tools to examine the reliability of individual SAGE tags, meaning the probability that a tag is "unique" or that it matches more than one gene (see FAQ EXP18 for details).

3. EST data:

EST sequences are often derived from individual tissues of an organism, therefore it is possible to use these data to get a "rough" impression of the expression level of a certain gene in this tissue. This is done by a normalization procedure, via comparison to the total content of EST sequences from this tissue.
    Again, SOURCE entries provide access to these data, in the box "UniGene and EST expression information". A normalized expression distribution for tissue types is calculated, and the ratio of cluster clones versus total tissue clones is indicated.
   NCBI Entrez Gene is another good starting point to retrieve EST expression data by following the UniGene link of an entry. UniGene presents lists of the tissues where the ESTs were derived from. Sometimes, a remark "Highly represented in library xy" is indicated. In contrast to SOURCE, there is no normalized value given.
   GeneCards is another good resource for expression data of genes of interest. In particular, graphical images are displayed showing the expression in individual tissues derived from array experiments, as well as an "electronic Northern" of UniGene (EST) data.
    ECgene (gene prediction by EST clustering) predicts genes by combining genome-based EST clustering and a transcript assembly procedure. There is a specific section of ECgene called ECexpression, which is the expression data viewer of ECgene. ECexpression utilizes the extensive expression data from EST and SAGE sources.

4. RNA in situ expression images:
   
    Please refer to FAQ EXP20 for this purpose.

5. RT-PCR, Northern Blot, Western Blot data:

    Please refer to FAQ EXP21 for this purpose.

6. Protein microscopy data based on immunostains:

    Please refer to FAQ PROT6 for this purpose.
                  
Main Index  FAQ Index 
                               


EXP2...know which available microarrays contain a gene or a whole gene set of interest ? (last update Aug. 30, 2005)

    GENERAL REMARK: Often, individual microarray probesets and also EST sequences correspond to only ONE specific splicing variant of a gene. Thus, in cases where different splicing variants may also differ in tissue-specific expression, it is necessary to consider this point. Please refer to FAQ EXP15 for this purpose !

    Tip! For this purpose, the program Resourcerer at the TIGR webpage is a very nice tool. RESOURCERER provides annotation based on the TIGR Gene Indices (TGI) for commonly available microarray resources, including widely used clone sets and Affymetrix GeneChip Arrays. RESOURCERER also allows comparisons between resources from the same species using either the TGI or UniGene and between species using the EGO database. Please note that Resourcerer is NOT a repository for data of chip experiments. BUT it is a very good tool for large-scale annotation of e.g. EST accession numbers. Resourcerer currently works using human, mouse, or rat accessions. You can see a list of all the array types currently stored in this database by activating the drop-down menu "Data Set:" at the Resourcerer start page !
    If you have a list of accession numbers, click at the link "Batch Search" and either upload a *.txt file containing your accession numbers, like UniGene, RefSeq, GenBank incl. EST Acc., LocusLink  (separated by spaces) or simply type in the numbers in the text box. You will retrieve a table containing links to UniGene, LocusLink, the human, mouse, and rat TIGR indices, and to the GO database. You can save the output page in *.html format  using your Browser's "Save page..." function and open it in applications like WORD or EXCEL, having all the hyperlinks fully intact. If the list is very long, it will be separated into multiple files !!! OR: If you use the "Download Virtual"-function of Resourcerer, you will get a tab-delimited txt-file of the table (single file irrespective of its length) which you can import into EXCEL, but which has NO hyperlinks. Please note that the output file lists the array names which contain a gene of interest but NOT the individual identifiers (like ProbeSet IDs in case of Affymetrix arrays), please refer to the BioMart description below for this purpose.
    Example: If you e.g. want to know which available chip contains ESTs of your gene of interest, you simply have to open the UniGene cluster of your gene, activate "switch to text mode" at the bottom of the page (or leave it in html-format), save the file as *.html format. Then open the file in WORD and mark all accession numbers by holding down the "Alt" key (or in html-format: mark the table row), copy the accession numbers into a new file; convert the table into text, and copy all accession numbers into the Resourcerer query field. Those ESTs which are contained in an available array will show the respective information ! For additional information, you may also refer to the corresponding section at the main page !

    Tip! Another database for this purpose is CleanEx, in particular the CleanEx Target database. This database can be easily searched using the name of your gene of interest. This will produce a list of identifiers in the CleanEx target database which include Affymetrix ProbeSets, IMAGE cDNA clones, INCYTE cDNA clones, RefSeq cDNAs, and SAGE tags. Note that CleanEx also provides a very convenient Batch Search which lets you retrieve all the CleanEx Target entries corresponding to the given input identifier list and the given organism. It will also retrieve all array experiments which are stored in CleanEx which contain the respective identifier. Possible queries include target-to-target retrieval as well as gene-to-target, RefSeq-to-target or Unigene-to-target retrieval. Accepted identifiers include gene symbols, RefSeq, UniGene, and more. Please refer also to the CleanEx main section for details.

    If you want to scan only the Affymetrix microarrays, it is possible to search through the list of available arrays, if and where a certain gene of interest can be found, using the tool ArrayFinder. You may enter any keyword, gene symbol, or accession number to find relevant probe sets on all GeneChip arrays. BUT: Although ArrayFinder offers to query using multiple accessions at once (space-separated), tests showed that it seems to be impossible to use more than about 5 accession numbers "in-batch" ! Therefore, even for screening the Affymetrix chips, you may better use the tool Resourcerer !

     If you mainly want to scan the most widely used Affymetrix arrays (and a few others), you can feed your list of genes also into BioMart (at the "Filter" Page, "Limit to Genes with these IDs") in order to achieve an annotation table, which maintains all hyperlinks actively in an EXCEL sheet. At the "Output" Page, within the "Features" tab, you can select the specific array type which you want to scan  which reveals all the (ProbeSet) IDs corresponding to your genes of interest. Note that in comparison to Resourcerer, you get the gene-specific IDs but you can only screen ONE array at a time (meaning that this option is mainly suitable for purposes where you already know one specific array you are interested in). 

    Tip! If you have a single gene you are interested in (not a batch submission), you may also perform a very simple query at GEO, the microarray data repository at NCBI. You can simply enter the gene name at the ENTREZ GEO site, which extracts a list of microarray experiments ("GEO Datasets" = "GDS") using array types ("GEO platforms" = "GPL") containing your gene of interest. You can simply scroll the lists and wrtite down the cited GPLs (which of course may be listed several times). NOTE that this option probably picks the datasets from the most recent microarray platform developments !!!

    Again, if you have a single gene you are interested in (not a batch submission), you may also have a "quick-look" at the UCSC Genome Browser, either via BLAT search using the cDNA sequence or via keyword search at the Genome Browser Gateway (Note that "position" may also mean gene name !). The genomic organization of your gene of interest is graphically displayed. Now you may select all "tracks" related to microarray IDs in order to show them along the sequence (don't forget to hit the "Refresh" button). Although only a few (the most widely used) microarray platforms are included, this option is extremely useful to quickly identify array probesets belonging to different splice variants of a gene, because the alignment to individual exons is displayed.

    Tip! Similarly, the UCSC Gene Sorter is a resource which very quickly shows if your gene of interest is contained on the most widely used microarray types (in particular Affymetrix arrays). You simply select the organism and type in the gene name or other identifiers. You may have to "activate" those columns of interest via the "configure" button by selecting entries like "U133 ID" or "U74 ID". If no identifier is displayed in a specific column, your gene of interest is not part of this specific array type.
             
Main Index  FAQ Index    
                                


EXP3...know which published microarray experiments contain data of my gene of interest ? (last update Aug. 30, 2005)
   
    Still, there is no "universal database format" for storing and accessing complete datasets of microarray experiments. Nevertheless, public repositories are emerging that address this question. Please note that many microarray experimental data are only accessible via the author's homepages and are not (yet) submitted to public repositories. Therefore, it is still recommended to additionally search PubMed or even Google using appropriate keywords.

    Tip! CleanEx, provided by the Swiss Institute of Bioinformatics (SIB), is a database which provides access to public gene expression data via unique approved gene symbols and which represents heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-dataset comparisons. So far, CleanEx contains only human genes for which the symbol is approved by the HUGO nomenclature committee. There is one entry per gene name. NOTE: Thus, CleanEx is NOT a repository of expression data in the strict sense but it collects expression data from several resources (GEO, ArrayExpress, SMD, etc.) in a gene-centered way in order to make it available via one common interface. Please note that in particular GEO datasets can be analyzed at the original GEO site in a much more sophisticated way. Also note that the content of CleanEx in general "lags behind" the one at the "mother" databases. Please refer also to the CleanEx main section for details.

    Tip! SOURCE of the SMD-Stanford Microarray Database is a powerful and very user friendly database for storage of raw and normalized microarray experimental data. SMD stores data from microarray experiments, as well as their corresponding image files. In addition, SMD provides interfaces for data retrieval, analysis and visualization. SOURCE has an easy query method which allows to use GB IDs, LocusLink IDs, UniGene ID, or gene names. Currently 3 species (human, mouse, rat) are available. When available, there is a link to published microarray expression data, including data on the gene of interest. The visualized data are very easy to interprete: RED means UP and GREEN means DOWNregulated. Please note that by klicking onto the "red and green expression bar" of a single gene you can retrieve all other genes with similar regulation; in this list you may again klick on every gene and retrieve respective genes with similar expression !!! Note that the link "Authors' webpage" offers direct access to the primary databases holding the expression data. Please note that not all microarray data stored in the SMD (Stanford Microarray Database) are retrievable via SOURCE, therefore you may also directly search SMD (basic or advanced) for datasets via lists of specific experimental setups. Please refer to the SMD description at the main page for details.
 
    Tip!
GEO (Gene Expression Omnibus) was launched by the NCBI, in order to support the public use of gene expression data. GEO is a gene expression and hybridization array data repository, as well as an online resource for the retrieval of gene expression data from any organism or artificial source. For a full description of the GEO functionality, please refer to the corresponding chapter at the main page. In brief, there are several options to query and browse the GEO database.
    If you want to search for expression data of a gene of interest, you can use one of the following options. Start with a "total ENTREZ search", which also includes GEO expression profiles in the output. Directly start at ENTREZ-Geo Expressions: You can simply enter your search term(s) like gene name, organism, tissue. Start at the GEO home-page, and enter all or part of the gene name, or gene symbol, into the "Query", "Gene Profiles" field.
    You may also search for expression data of  a sequence of interest. For this purpose, start at the "Query", "BLAST" field of the GEO homepage, or go directly to the GEO BLAST site, and enter your query sequence, a GenBank accession, or a GI. The GEO BLAST tool queries Entrez GEO for expression profiles of interest based on nucleotide sequence similarity.
    After performing one of these searches, you first get a typical ENTREZ output-list of GEO DataSets matching your query. Here you have several options.
    First
, you may enlarge the thumbnail images that are provided with each entry showing the corresponding expression profile. Simply click onto this image to enlarge it. The image represents the abundance profile for an individual gene across each Sample in a DataSet. Please refer to the GEO section at the main page for details about these graphs !
    Second, you can view the GDS record. At the bottom of the record, you find the list of Sample records (GSMs), which belong to the GDS file. You have several options to select / de-select individual GSMs for further analyses. GDS records often can be divided in several subsets (grouped by e.g. disease or age), which also can be checked / unchecked. By using the option "Query set A versus B", you may select 2 sets, compare them and display a list of e.g. all the genes which show a min. 4-fold expression in one set compared to the other ! The "Analysis" button provides features that help describe and visualize an entire dataset. For example, the option "Clustering" provides a visualization tool for displaying precomputed hierarchical cluster maps. Cluster portions of interest may be selected, enlarged, charted as line plots, and the original data downloaded. The "Download" button provides several options to download the full set or parts of the data as tab-delimited data tables.
    Third, the link "Profile Neighbors" retrieves other genes/molecules that show a similar profile shape over that dataset, possibly inferring some common function or regulatory elements.
    Fourth, the link "Sequence Neighbors" searches all GEO datasets for related genes based on nucleotide sequence similarity, and thus may be useful in identifying sequence homologs such as related gene family members, or for cross-species comparisons.
    Fifth, the link "Links" offers additional links, like GEO DataSets, which displays the GDS report within the browser window (not as JavaScript), without the "analysis and download functionality". UniGene displays the UniGene cluster corresponding to the current sequence. Nucleotide displays the database entry of the current sequence. MapViewer displays the position of the UniGene cluster within the graphical genome browser.

    ArrayExpress is a public repository for microarray based gene expression data, maintained by the microarray informatics group at the EBI (for details, refer to this chapter at the main page). Note that, at the time of writing, you can query ArrayExpress by experiments, arrays, and protocol types, BUT there is no "free-text" search option, AND there is no option to search for a GENE of interest (there is no possibility to query for e.g. UniGene accessions or EST GB acc. numbers), which is a MAJOR DRAWBACK compared to e.g. the NCBI-GEO database !   
                          
Main Index  FAQ Index  
                    

EXP4...identify genes with expression patterns similar to my gene of interest ? (last update Jun. 1, 2004)

    SOURCE of the SMD-Stanford Microarray Database is a powerful and very user friendly database for storage of raw and normalized microarray experimental data. SMD stores data from microarray experiments, as well as their corresponding image files. In addition, SMD provides interfaces for data retrieval, analysis and visualization. SOURCE has an easy query method which allows to use GB IDs, LocusLink IDs, UniGene ID, or gene names. Currently 3 species (human, mouse, rat) are available. When available, there is a link to published microarray expression data, including data on the gene of interest. The visualized data are very easy to interprete: RED means UP and GREEN means DOWNregulated. Please note that by klicking onto the "red and green expression bar" of a single gene you can retrieve all other genes with similar regulation; in this list you may again klick on every gene and retrieve respective genes with similar expression !!! Note that the link "Authors' webpage" offers direct access to the primary databases holding the expression data. Please note that not all microarray data stored in the SMD (Stanford Microarray Database) are retrievable via SOURCE, therefore you may also directly search SMD (basic or advanced) for datasets via lists of specific experimental setups. Please refer to the SMD description at the main page for details.

    Tip! GEO (Gene Expression Omnibus) was launched by the NCBI, in order to support the public use of gene expression data. GEO is a gene expression and hybridization array data repository, as well as an online resource for the retrieval of gene expression data from any organism or artificial source. For a full description of the GEO functionality, please refer to the corresponding chapter at the main page. Please, also refer to question EXP3 for details.
    If you want to search for expression data of a gene of interest, you can use one of the following options. Start with a "total ENTREZ search", which also includes GEO expression profiles in the output. Directly start at ENTREZ-Geo Expressions: You can simply enter your search term(s) like gene name, organism, tissue. Start at the GEO home-page, and enter all or part of the gene name, or gene symbol, into the "Query", "Gene Profiles" field.
    After performing one of these searches, you first get a typical ENTREZ output-list of GEO DataSets matching your query. Here you have several options. One of these is the link "Profile Neighbors", which retrieves other genes/molecules that show a similar profile shape over that dataset, possibly inferring some common function or regulatory elements.

   Tip! The UCSC Gene Sorter is an excellent resource for exploring gene families and the relationships among genes. This tool displays a table of genes within a selected genome that are related to one another. Several different relationships may be explored: protein-level homology, similarity of gene expression profiles, or genomic proximity. The Browser supports searches on a variety of terms and phrases, including the gene name, the SwissProt protein name, a GenBank accession, or a word or phrase present in a gene's description. At the "Sort by" field, you can e.g. choose "Expression (GNF)", which looks for all datasets in this database which show a similar expression pattern to your gene of interest. Note that although very user-friendly, the amount of expression data referenced by this browser is limited to certain "whole-genome normal expression datasets" (at least in the current status).
    The gene family display is highly configurable, allowing the user to control the order and number of columns, the number of rows, and the genes displayed. The tool provides several output formats, including a simple tab-delimited format that may be imported into a spreadsheet or a relational database. In addition, the sequences of the displayed genes can be downloaded: cDNA, protein, genomic and promoter (!) sequences, allowing a user-definition of upstream and downstream regions. Please refer also to the UCSC Gene Sorter main section for details!
    
Main Index  FAQ Index  
                    

EXP5...query microarray data not by gene name but by the "nature of the experiment" ? (last update Nov. 11, 2004)

    Please note that many microarray experimental data are only accessible via the author's homepages and are not (yet) submitted to public repositories. Therefore, it is still recommended to additionally search PubMed or even Google using appropriate keywords.

    Tip! GEO (Gene Expression Omnibus) was launched by the NCBI, in order to support the public use of gene expression data. GEO is a gene expression and hybridization array data repository, as well as an online resource for the retrieval of gene expression data from any organism or artificial source. For a full description of the GEO functionality, please refer to the corresponding chapter at the main page. There are several options to browse and query GEO.
    If you already know a valid GEO accession number you can query GEO using this ID at the Accession Display Tool. You can also simply use the bar found at the foot of the GEO home page and the top of each GEO record.
    If you want to browse through lists of GEO data and experiments, e.g. to simply scan alphabetical lists of experiments stored in the database (e.g. "endothelial cell profiles"), use the GDS Browser. You can sort by different criteria, like title of experiment, organism, platform type or accession, and GDS accession.
    If you want to search for an experiment of interest, you can use one of the following options. All 3 options initiate a search in the database Entrez GEO Datasets which stores all GDS annotation including the GDS description, reference series and sample descriptions, titles, keywords, source material, contributer, authors and organisms. Start with a "total ENTREZ search", which also includes experimental descriptions in so-called GEO DataSets in the output. Directly start at ENTREZ-Geo DataSets: You can simply enter your search term(s) like organism, tissue author name, GEO terms or accessions. Start at the "Query", "DataSets" field of the GEO homepage, and enter your search terms. For detailed information concerning the construction of complex queries, please refer also to the Quick Query Builder tutorial.

   ArrayExpress is a public repository for microarray based gene expression data, maintained by the microarray informatics group at the EBI (for details, refer to this chapter at the main page). You can query ArrayExpress by experiments, arrays, and protocol types. NOTE that, until recently, there was no option to search the stored array experiments for data of a gene of interest  which was a MAJOR DRAWBACK compared to e.g. the NCBI-GEO database. Now, there is at least a prototype of such a program, which only scans a small set of array accession numbers (experiments) using a gene name as query.

   SMD (Stanford Microarray Database) is a large array data repository of Stanford University, providing public access to published array experiments. You may search SMD (basic or advanced) for datasets via lists of specific experimental setups. You may select for organisms, experimenters, categories (like "T-cell" or "stress") and sub-categories (like "aging", "development", or "infection"). Within the retrieved datasets, there is a multitude of options for data analysis and download. Please refer to the SMD description at the main page for details. Please note that not all microarray data stored in the SMD are retrievable via SOURCE!    
                                       
Main Index  FAQ Index    
                                                 

     
EXP6...perform in silico expression profiling in the field of endothelial cell biology ? (last update Jun. 14, 2005)

0. How to search for expression profiles of endothelial cells (ECs) ?

    As also discussed in FAQs EXP5 and EXP10, there are generally 2 ways of obtaining these data, either via keyword searches in databases or via browsing their lists of stored microarray datasets. In cases when microarray data are not available via public repositories, a third option is to simply perform PubMed or even Google searches to retrieve literature of such data, which often contains at least links to the author's homepages, allowing certain analyses. This, in fact, is an important issue as a huge amount of data is buried in such "author's homepages" !
   
1. Global profiles of "normal" (uninduced) ECs:

1.1. GNF Gene Expression Atlas v1 (Novartis, Dec. 2002):

    Tip! The GNF Gene Expression Atlas (version 1) contains a survey of gene expression profiles in a diverse group of primary human and mouse tissues and organs, as well as transformed cell lines, performed with Affymetrix U95A (human) and U74A (mouse) arrays. GNF is the Genomics institute of the Novartis research Foundation. A total of 101 unique specimens representing 47 tissues / cell lines from the normal physiological state are represented, providing a survey of the human and mouse "transcriptomes". As one of these cell lines, HUVEC (Human Umbilical Vein Endothelial Cells) are included. You can search these data directly at the GNF site, either by Keyword or Accession (GB, UniGene, LocusLink,...), Sequence (BLAT Search), or by Expression Pattern. Note that the last option can be extremely useful meaning that you are able to choose a single tissue/cell type and see ONLY the genes showing a pronounced expression. Note that besides a very instructive graphical display, there is a link "Find similar expression", revealing respective genes.
    This series of array data is also available as NCBI-GEO dataset GDS181 ("Large-scale analysis of the human transcriptome") and as GEO series record GSE96. Please also refer to FAQs EXP3 and EXP11 for additional information. You may selectively search this GDS record for your gene of interest, using ENTREZ GEO Expressions, by entering e.g. "ELAM AND GDS181". Note that sometimes, gene names (like "TNF") yield heterogenous lists of "similar" hits, in this case you either have to scan for the specific entries or you first look for the array-specific IDs (like ProbeSets for Affymetrix chips) of your gene (described in FAQ EXP2), and then use these IDs for your GEO search instead of the gene name.
    Please note that also the SOURCE database can be used for this purpose, although it is not possible to define complex queries like in GEO. Nevertheless, you may start with your gene name of interest (like "ELAM"), and then follow the link "Show gene expression data", which yields a list of microarray data stored in SOURCE, refering to your gene of interest. If available, the GNF data are listed as "NormalTissueAtlas". The respective link leads to the "SOURCE specific" display, showing high expression as shades of red and low expression as shades of green. Note that you can immediately display genes with similar expression by clicking onto the color bar !
    Tip! Please note that also the UCSC Gene Sorter may be used to display this GNF exression data set. This tool displays a table of genes within a selected genome that are related to one another. Several different relationships may be explored, one of them is the similarity of gene expression profiles, as available from the GNF data. This means that, similar to the option at SOURCE, you may start with your gene of interest and display genes with similar tissue specific expression by choosing the appropriate option. Note that the UCSC Gene Sorter does not display all the tissues / cell lines in the "initial view", but only a "selection". Anyway, the user may display all tissues via the "Configure" button ! Please also refer to FAQ GENOM2 for additional information.

1.2. GNF Gene Expression Atlas v2 (Novartis, Mar. 2004):

    Tip! The Version 2 of GNF Atlas is based on even more tissues and cell lines than version 1. In the context of endothelial cell biology, bone marrow derived CD105-positive ECs are included. These data may be searched using the GNF site itself. Note that this version 2 dataset was not only created using the Affymetrix chips U133A (human), but also using data derived from Novartis - designed arrays (human = GNF1H; mouse = GNF1M), using a panel of RNAs derived from 79 human and 61 mouse tissues or cell types. Using the link "Search Expression", you are able to choose a single tissue/cell type and see ONLY the genes showing a pronounced expression. When choosing the "Search Type" "Correlation Search", you may list genes showing high expression in " BM-CD105+Endothelial cells" and e.g. low expression in dendritic cells or T-cells. Please also refer to EXP11 for details.
    Note
that also version 2 is integrated in the UCSC Gene Sorter, allowing a very convenient "quick-view" of the expression profile of your gene of interest. Thereby, you may especially compare the expression of your gene in ECs with e.g. T-cells, B-cells, NK-cells, monocytes, or dendritic cells (see also FAQ EXP10). In addition, all genes showing similar expression are also displayed.
     This series of array data is also available as NCBI-GEO series record GSE1133 ("Tissue-specific pattern of mRNA expression"); there is no GDS record yet (therefore, it is not possible to perform these "compare 2 sets (A vs B)" analyses, yet). The GNF1H array is described as platform record GPL1074; the GNF1M array is described as GPL1073.

1.3. Endothelial cell profiles (Stanford, Jul. 2003):

      This dataset reflects a specific investigation of endothelial cell profiles, but using chips of only 1.200 genes, spotted as cDNAs. This means that these chips (GPL217) are covering only a fraction of the whole genome, and interestingly, queries did not find well-known genes like ELAM or VCAM. Anyway, different types of ECs are included (HUVEC, human umbilical vein endothelial cells;HMVEC, human lung microvascular endothelial cells; HAEC, human aortic endothelial cells; HCAEC, human coronary artery endothelial cells) and may be compared either between each other or to other cell types like smooth muscle cells, astrocytes, or HepG2 cells. These data are available as GEO series record GSE515, and as dataset GDS204, thereby providing the option to compare 2 sets (A vs B) of cell types or tissues, in order to produce lists of genes which are x-times higher expressed in one compared to the other. You simply select the appropriate check-boxes and choose a value for x. When hitting the "QueryAvsB" button, the output is generated. Note that in the case of GDS204, you may also compare the whole group of endothelial cell types to the group of "other" cell types.

1.4. Endothelial cell diversity (Stanford, Sep. 2003):

    In this work, a large-scale comparative analysis of a series of different endothelial cell subtypes was performed (Chi et al., PNAS 2003). In particular, ECs from artery (coronary, pulmonary, umbilical, iliac, and aorta) were compared to ECs from vein (umbilical, saphenous), as well as ECs from various tissues (lung, skin, intestine, uterus, bladder, myocardium, nasal). In addition, microvascular ECs were compared to large vessel ECs.
    These data are available on the web site companion to the publication. Here, you may not only download the primary data and view the web supplements, but you can also interactively explore enhanced versions of the figures from the paper, using the GeneExplorer software. GeneExplorer is a web-based program which allows users to navigate clusters and search for specific genes by name or symbol. NOTE that GeneExplorer works much better with MS Internet Explorer than with Netscape! Click on a region within the left panel ("Radar": customizable via the "percentage" drop-down menu) to view it magnified within the right panel ("Zoom"). Click on the "SC" links preceding the gene names for additional information on the clones and genes listed. These links point to SOURCE GeneReports. Click on the expression colorbar of any gene of interest within the zoom image area to see the 20 most similar genes.
    Alternatively, these data are available via the "Search" tool at the Stanford Microarray Database (SMD). Here, you have a multitude of options for data analysis and download. You may view, sort, download array data, or you may even see the clickable image of all arrays and click onto individual spots to reveal the spot details.

1.5. Cardiac and aortic endothelial cells (University of Antwerp, June 2004):

    In this work, the gene expression profiles of endocardial (EE) and aortic (AE) endothelial cells of rat were analyzed. These data are available as NCBI GEO dataset GDS695, series record GSE1478. As platform, rat Affymetrix U34 arrays were used.

1.6. Lymphatic endothelial cells (LECs) and blood vascular endothelial cells (BECs) and Kaposi sarcoma reprogramming (Cancer Research, UK, July 2004):

    In this experiment, published in Nat. Genet. by Wang et al., a series of human Affymetrix U133A arrays was used to compare the global expression profiles of endothelial cells (LEC, BEC, HUVEC, MVEC) to other cell types like fibroblasts, smooth muscle cells, or mesenchymal cells. In addition, a comparison between Kaposi sarcoma lesions and "normal" skin was performed, revealing similarities between KS profiles and EC profiles, especially LEC ones. These data are available at the ArrayExpress database under accession number E-MEXP-66. Simply enter this accession number at the page "Query database", in the field "Query for Experiments". You may then hit the "Retrieve Data" button, which leads to the download of the whole dataset. If you now want to focus on the expression of your gene of interest, you have to locate the respective row using a specific identifier, and then concentrate on those columns containing the "Signals". You may delete all other rows and columns and finally produce a diagram from the "Signal" values.

2. Endothelial cells and inflammation:

2.1. IL-1 stimulation of HUVEC (BMT, May 2004):

    In this experiment, HUVEC (Human Umbilical Vein Endothelial Cells) were treated with Interleukin-1 for various periods of time up to 6 hours, and the global changes of expression patterns were analysed using human Affymetrix U133A arrays (Mayer et al., ATVB 2004). This experiment was performed within the BMT (Bio-Molecular Therapeutics, Vienna), in collaboration with the Clinical Institute of Medical and Chemical Laboratory diagnostics, Vienna. These data are available as NCBI-GEO series record GSE973, providing links to the individual sample records GSM15389 to 15393, corresponding to the individual time points. This series of data is also available as GEO Dataset record GDS649, providing many options for interactive data analysis and download.

2.2. Inflammatory cytokine effect on five primary endothelial cell types (DNAX Research Inc., Nov. 2003):

    These datasets represent an examination of gene expression induced by interferon gamma (IFNg), tumor necrosis factor alpha (TNFa) and interleukin 4 (IL4) inflammatory cytokines on 5 different primary endothelial cells (lung: GDS498; aortic: GDS499; iliac: GDS500; dermal: GDS501; and colon: GDS502). These samples are collectively described in GEO series record GSE569. These data were performed on Incyte Gene Album Arrays 1-6, GEO record GPL371, containing approx. 37.000 human cDNAs. Please note that within the GDS records, you may e.g. generate lists of genes selectively induced or repressed by one of the 3 stimuli as compared to the other. An important remark has to be added here: As also confirmed by email from the company, the labeling was designed in a way that all values seen in the GEO graphs are actually "the other way round", meaning that high bars correspond to DOWN-regulation, whereas small bars correspond to UP-regulation (as tested with genes like ELAM, known to be strongly up-regulated by TNF) !!! So, always stay extremely cautious when interpreting microarray data, and always perform "positive controls" for yourself !

2.3. Steroids Effect on HUVEC Response to LPS or Cytokines (Institute of Surgical Research, San Antonio, Jun. 2004):

    HUVEC were treated with Lipopolysaccharide (LPS) or cytokine mix (CM), a mixture of proinflammatory cytokines (TNF-α, and IFN-γ) for 4 hours. All treatments were done in quadruplicates. At the end of treatment, total cellular RNA was isolated. These data are not (yet) available as NCBI GEO dataset, but as series record GSE1486. As platform, the HUMAN_21K_OligoArray_2 (GPL1225) was used, a non-commercial array of spotted 70mer oligonucleotides.

2.4. Leukotriene LTD4 effect on macrophage and endothelial cells (Institute Vascular Medicine, Jena, Germany, Aug. 2004):

    Human umbilical vein endothelial cells (HUVEC) or the human macrophage cell line, Mono-Mac-6 were treated with the pro-inflammatory mediator Leukotriene D4 for 1 hour. These data are available as NCBI GEO dataset GDS731, series record GSE1644. As platform, human Affymetrix U133A arrays were used.

2.5. HUVEC gene profile after TNF-stimulation (University of Muenster, Germany, May 2005):

    HUVEC were left untreated or stimulated for 5h with 2 ng/ml TNF. Comparsion of the gene profiles revealed TNF-mediated gene expression changes in HUVEC. These data are not (yet) available as NCBI GEO dataset, but as series record GSE2639. As platform, human Affymetrix U133A arrays were used.

3. Endothelial cells and stress:

3.1. Human endothelium exposed to shear stress and pressure (Internal Medicine Dep., Goeteborg, Sweden, Jun. 2004):

    Intact living conduit vessels (umbilical veins) were exposed to normal or high intraluminal pressure, or low or high shear stress in combination with a physiological level of the other force. These data are not (yet) available as NCBI GEO dataset, but as series record GSE1518. As platform, human Affymetrix U133A arrays were used.

4. Endothelial cells and hypoxia:

4.1. Responses of HUVEC to Hypoxia and Reoxygenation (Institute of Surgical Research, San Antonio, Feb. 2004):

    This is a timecourse experiment. The hypoxic treatment consisted of 1-hour hypoxia followed by various periods of reoxygenation (0, 3, 5, 12 and 24 hrs) for gene expression analysis. Total RNA was extracted from cultured HUVECs. These data are not (yet) available as NCBI GEO dataset, but as series record GSE1041. As platform, the HUMAN_21K_OligoArray_1 (GPL981) was used, a non-commercial array of spotted 70mer oligonucleotides.

5. Endothelial cells and angiogenesis:

5.1. HUVEC treated with VEGF-A versus PIGF (Cambridge University, Nov. 2003):

    In this experiment, HUVEC were treated with angiogenic factors vascular endothelial growth factor-A (VEGF-A) versus Placental Growth Factor (PIGF), in low or high serum media, in a time course up to 42 hours. These data are available as NCBI GEO dataset GDS495, series record GSE837. As platform, human Affymetrix U95 arrays were used.

6. Endothelial cell development:

6.1. Endothelial progenitor cell expression profile (Molecular Cardiology, Frankfurt, Dec. 2004):

    Expression profilng of endothelial progenitor cells (EPCs) derived from peripheral blood. EPCs compared to human umbilical venous endothelial cells (HUVEC) and CD14+ monocytes. Results provide insight into the mechanism underlying the positive contribution of EPCs to neovascularization. These data are accessible as GEO dataset GDS1075, series record GSE2040, platform: Affymetrix human U95A (GPL91).
                   
Main Index  FAQ Index    
           

     
EXP7...perform clustering analyses of microarray data ? (last update Jun. 18, 2006)

As often, there are commercial and free software packages available which address this question.

    Tip!  Expression Profiler, provided freely by the EBI, is a set of tools for clustering, analysis and visualization of gene expression and other genomic data. Tools in the Expression Profiler allow to perform cluster analysis, pattern discovery, pattern visualization, study and search Gene Ontology categories, generate sequence logos, extract regulatory sequences, study protein interactions, as well as to link analysis results to external tools and databases. The main module of Expression Profiler is EPCLUST (Expression Profile data CLUSTering and analysis), which is a generic data clustering, visualization, and analysis tool for numeric (e.g. gene expression data) as well as sequence data. Please refer to the corresponding Expression Profiler chapter at the main page for a detailed description how to efficiently use Expression Profiler !

    Expression Profiler: Next Generation (EP:NG) is the new EP version (2004), providing the full functional range from the old version AND additional features, together with a "unified" user interface. Expression Profiler  is a web-based platform for microarray gene expression and other functional genomics-related data analysis. The new architecture, EP:NG, modularizes the original design and allows individual analysis-task-related components to be developed by different groups and yet still seamlessly to work together and share the same user interface look and feel. Please refer to the corresponding Expression Profiler: NG chapter at the main page for a detailed description, including also a list of problems encountered when using this new version!
             
Main Index  FAQ Index
                

             
EXP8...get all proteins from endothelial cells involved in inflammation (SRS and GO approaches) ? -> see RET2      

    Note that RET2 describes approaches based on sequence retrieval tools, as well as approaches based on the Gene Ontology (GO) system of functional assignments. For comparison, approaches to address this question based on expression analysis are discussed in FAQ EXP6.
   
Main Index  FAQ Index    
       

    
EXP9...submit my own microarray data to a public database ? (last update Jan. 3, 2006)

    As already discussed especially in questions EXP3 and EXP5, there are only a few public repositories of microarray data, which accept data from institutes all over the world, comparable to sequence deposit at GENBANK. The two major databases are ArrayExpress of the EBI and GEO of the NCBI. As I investigated the "overall-handling" and user-friendliness of these databases, I came to the conclusion that at the current status, GEO is the better solution. Therefore I am focusing on the data deposit at GEO in the following text.
   
    Tip! GEO (Gene Expression Omnibus) was launched by the NCBI, in order to support the public use of gene expression data. GEO is a gene expression and hybridization array data repository, as well as an online resource for the retrieval of gene expression data from any organism or artificial source. For a full description of the GEO functionality, please refer to the corresponding chapter at the main page. In order to submit data to GEO, you should follow a flow-chart of different steps.
    First, you should check and prepare your data to be "as MIAME compliant as possible". The Minimum Information About a Microarray Experiment (MIAME) is a standard developed by the Microarray Gene Expression Data (MGED) group for the content of data which should be supplied when publishing microarray experiments. Please note that there is a comprehensive, yet concise MIAME checklist available, which is a guide to authors, editors, and reviewers of microarray gene expression papers. This checklist should be considered as a basis for the preparation of submissions to public repositories. Especially, you should consider the requirements concerning the description of the experiment design, the nature and preparation of samples, the hybridization procedures, and the data measurements. Note that the specifications concerning the array design are only needed in cases when the array type used is not already present in the chosen database.
    Second, it is generally favourable to browse the GEO database and to view some entry examples in order to get an impression of the overall database architecture. In general, GEO consists of four entities, which are submitter, platform, sample, and series. These entities have to be deposited separately, in a process which is described concisely in the GEO web deposit guide.
    Fourth, you have to create your own GEO account ("submitter" entity) by entering your contact information. This is publicly accessible information that is necessary so that proper credit can be given for data. This "user" and "password" information can from-now-on be used to log in and submit new entries or manipulate / update existing entries.
    Fifth, check to see if your platform (microarray type) already exists in GEO. If your experiments were performed using commercial arrays (e.g., Affymetrix), it may not be necessary to submit a platform record. If you find the relevant platform already deposited in GEO (view all commercial nucleotide platforms), take a note of its GEO accession number ("GPLxxx"), which you have to submit with each sample record (individual array measurement).
    Sixth, submit your hybridization data as sample records ("GSMxxx"). A sample record references one platform, and describes the abundance measurements of a single hybridization/experimental condition (meaning if you e.g. submit an experimental time course represented by several chips, you have to submit each chip data as individual sample record). You will first be asked to specify the experiment type (e.g., single channel for Affymetrix chips) and the platform accession number. Next you must provide the sample data table in text, tab-delimited format, meaning you have to save your EXCEL sheets in this format. Please note that GEO does not accept values which have a comma (or any other punctuation) as decimal separator, except periods (!). The easiest way to choose this format for your values is to make the appropriate selection in the Windows Control Panel (Regional Options -> Numbers -> Decimal symbol). The first row of the data table must contain the column headers. Sample data tables require a column named "ID_REF", matching the "ID" column of the reference platform, and a "VALUE" column. For dual channel experiments, the VALUE will reflect the normalized log ratio measurements. For single channel experiments, the VALUES will be normalized (scaled) signal count data (not log transformed), which in the case of Affymetrix MAS4.0 software is the value "Average difference" and in version MAS5.0 is the "Signal". Note that this "value" is consistent across all samples and can therefore be used for comparisons between different chips (hybridizations). In case of Affymetrix data, GEO strongly suggests to provide 2 additional columns, namely "ABS_CALL" displaying the "Present/Marginal/Absent" detection calls, and finally "DETECTION P-VALUE" showing the respective information. Note that you may optionally provide even further columns. After the data table has passed validation, you will be asked to supply the sample title, organism, description, authors and keywords. The 'Description' field may hold very large volumes of data, and it is encouraged that submitters provide a thorough report of the sample, which may include a detailed description of the biological source, experimental conditions and treatments, labeling and hybridization protocols, spot quantification and normalization schemes.
    Seventh, submit a series record ("GSExxx"). A series brings together a related group of samples, and provides a focal point and description of the experiment as a whole. Information reflecting experimental sample subsets may also be specified. Submitters are encouraged to supply information regarding the overall experimental design, aim, summary results and conclusions. If you e.g. publish a series of samples representing a time course experiment, then the series record is the suitable place to define the correct ordering of the individual samples, and also, if applicable, the definition of subsets according to fields like tissue, disease or treatment.
    Note
that these series records are used by GEO staff to produce a so-called GEO Dataset record
("GDSxx"), as soon as the data become public, allowing the interactive analysis of the sample records and their subsets and providing multiple download options. These GDS records are displayed in the output lists when performing an ENTREZ search against the GEO expression profiles, using e.g. your favourite gene of interest as query.
    Eighth, each record you submit will receive a unique and stable GEO accession number which you may quote in manuscripts. Records may remain private for several months until the data is published. During this period, you may request a "read-only" password (email geo@ncbi.nlm.nih.gov) which allows collaborators or reviewers confidential access to your private data prior to publication.
    Ninth, one final point should be considered in order to fulfill the MIAME guidelines, which state that not only the processed
but also the raw data have to be supplied in order that other users have the ability to re-evaluate your dataset using alternative statistical algorithms. Therefore, in the case of Affymetrix experiments it is recommended that original .cel files are supplied via FTP (FTP server details via email). Please name these files after the GEO accession number they correspond to, e.g., GSM12345.cel. Links will be supplied on GEO sample records allowing users access to the original data files.

Main Index  FAQ Index     
    

     
EXP10...compare the expression of my gene of interest in B-cells, T-cells, monocytes, and dendritic cells ? (last update Aug. 20, 2004)
             
    This question is somehow related to FAQ EXP3, which explains the principal options, how to search public expression databases for data concerning your gene of interest, and to FAQ EXP5 which describes ways to search for experimental conditions (including cell types used). In this FAQ now, we specifically address the point how to compare the expression profiles of your gene of interest in a series of different cell types. As example, we are looking for the expression of IL8 (Interleukin 8) in B-cells, T-cells, monocytes, and dendritic cells. 
       
1. "One-step" keyword search:

    GEO provides a lot of options to create simple or complex queries, both searching for expression profiles as well as experimental conditions like cell types, organisms and treatments used. If you directly query ENTREZ GEO Expressions using only the gene name ("IL8"), you will get a very long list of all microarray experiments having this gene on the chip. Therefore, you have to combine the gene name with a search for the experimental setup you are interested in. Please note that there is a link "Preview/Index" just below the query box, which is very useful as it immediately displays the number of hits produced by a specifiy query. In general, tests showed that there is no "uniform system" for sample nomenclature, meaning that often it is not easy to choose the field which best fits to your query. Therefore, these "combined" keyword searches often produce either long lists of "false positives" or very short lists when an inappropriate field was chosen.

2. "Two-step" keyword search:

    If you prefer a "Two-step keyword strategy" over the combined approach, you may first query ENTREZ GEO Datasets for stored microarray experiments using your cell type of interest (like "dendritic"), and then specifically search in a second step for the expression of your gene(s) of interest within this data set, using ENTREZ GEO Expressions. There, you may simply combine (using "AND") the GEO dataset accession with the gene name of interest. Note that sometimes, gene names (like "TNF") yield heterogenous lists of "similar" hits, in this case you either have to scan for the specific entries or you first look for the array-specific IDs (like ProbeSets for Affymetrix chips) of your gene (described in FAQ EXP2), and then use these IDs for your GEO search instead of the gene name. In our case, this approach, although suitable for individual cell types, is still unsatisfactory, as it is very difficult to catch datasets comprising all the cell types listed above via a keyword search.

3. Browse microarray datasets:

    Tip! Note that, instead of playing around with search terms, it may be advantagous to simply browse the list of GEO datasets for experiments matching your interests. In addition, you may simply "browse PubMed or the whole internet" for appropriate array experiments. In our case, we would be screening for a panel of cell types, which are often contained in "global transcriptome datasets". In particular, the cell types listed above are contained in the Version 2 of GNF Atlas, which is split into 3 curated GDS (GEO Dataset) records: GDS592 covers the GNF1M data (mouse atlas); GDS594 covers the human GNF1H data; and GDS596 covers the human U133A data, and may as well be searched using the GNF site itself. Please also refer to EXP11 for details. Note that both GNF Atlas version 1+2 expression data are also integrated in the UCSC Gene Sorter, allowing a very convenient "quick-view" of the expression profile of your gene of interest. Thereby, you simply query for "IL8", and immediately produce a "red (up-) vs. green (down-regulated) image of tissue- and cell type-specific expression. Also note, that all genes showing similar expression are also displayed.

Main Index  FAQ Index     
    

                
EXP11...get a "virtual multiple tissue Northern blot" of my gene of interest ? (last update Mar. 3, 2006)

    Often, researchers in the lab are interested to get a "quickview" of the tissue- and cell type-specific expression of a novel gene of interest. In the "good old days", this was done (but still is done today) by so-called Multiple Tissue Northern blots, which are quite expensive. Today, public microarray databases store a vast amount of expression information for the majority of known genes, so it is quite likely that you will retrieve "in silico" the desired information.

    GENERAL REMARK: Often, individual microarray probesets and also EST sequences correspond to only ONE specific splicing variant of a gene. Thus, in cases where different splicing variants may also differ in tissue-specific expression, it is necessary to consider this point. Please refer to FAQ EXP15 for this purpose !

1. Microarray data:

    Tip! The GNF Gene Expression Atlas contains a survey of gene expression profiles in a diverse group of primary human and mouse tissues and organs, as well as transformed cell lines. GNF is the Genomics institute of the Novartis research Foundation. A total of 101 unique specimens representing 47 tissues / cell lines from the normal physiological state are represented, providing a survey of the human and mouse "transcriptomes". (Su et al. 2002 PNAS 99: 4465-70). These data were produced using Affymetrix human U95A and mouse U74A chips. There are several ways to define a query.
    You can search directly at the GNF site, either by Keyword or Accession (GB, UniGene, LocusLink,...), Sequence (BLAT Search), or by Expression Pattern. Note that the last option can be extremely useful meaning that you are able to choose a single tissue/cell type and see ONLY the genes showing a pronounced expression. Note that besides a very instructive graphical display, there is a link "Find similar expression", revealing respective genes.
    This series of array data is also available as NCBI-GEO dataset GDS181 and as GEO series record GSE96. Please also refer to FAQs EXP3 and EXP6 for additional information. You may selectively search this GDS record for your gene of interest, using ENTREZ GEO Expressions, by entering e.g. "Interleukin 8 AND GDS181". Note that sometimes, gene names (like "TNF") yield heterogenous lists of "similar" hits, in this case you either have to scan for the specific entries or you first look for the array-specific IDs (like ProbeSets for Affymetrix chips) of your gene (described in FAQ EXP2), and then use these IDs for your GEO search instead of the gene name.
    Please note that also the SOURCE database can be used for this purpose, although it is not possible to define complex queries like in GEO. Nevertheless, you may start with your gene name of interest (like "IL8"), and then follow the link "Show gene expression data", which yields a list of microarray data stored in SOURCE, refering to your gene of interest. If available, the GNF data are listed as "NormalTissueAtlas". The respective link leads to the "SOURCE specific" display, showing high expression as shades of red and low expression as shades of green. Note that you can immediately display genes with similar expression by clicking onto the color bar ! The link to the "author's webpage" on the left side leads to the GNF homepage (described above), where you may repeat your search to produce a different graphical image of the expression profile.
    Please note that also the UCSC Gene Sorter may be used to display the GNF exression data set. This tool displays a table of genes within a selected genome that are related to one another. Several different relationships may be explored, one of them is the similarity of gene expression profiles, as available from the GNF data. This means that, similar to the option at SOURCE, you may start with your gene of interest and display genes with similar tissue specific expression by choosing the appropriate option. Note that the UCSC Gene Sorter does not display all the tissues / cell lines in the "initial view", but only a "selection". Anyway, the user may display all tissues via the "Configure" button ! Please also refer to FAQ GENOM2 for additional information.
    The data retrieval tool BioMart, which is described in detail in e.g. FAQ RET3, also provides an option to filter (in the section "Expression" at the "Filter" page) any retrieved data set, in order to keep only those entries (genes, SNPs, etc.) providing a link to expression data of the GNF database. Note that you also have to choose these expression data at the output tab "Features", if you want to display the respective information.

   Tip! The Version 2 of GNF Atlas was released in March 2004, which was not only created using the Affymetrix chips U133A (human), but also using data derived from Novartis - designed arrays (human = GNF1H; mouse = GNF1M), using a panel of RNAs derived from 79 human and 61 mouse tissues or cell types. Note that GNF1H essentially contains genes which are NOT present on the Affy U133 arrays, meaning the two chips are complementary. GNF1H is also termed GNF1B at the GNF site. You may query using gene symbols, accessions, keywords, er even sequence. Note that using the link "Search Expression", you are able to choose a single tissue/cell type and see ONLY the genes showing a pronounced expression.
    Note that these new data are especially useful if you are looking for profile comparisons between a list of cell types, like bone-marrow (BM) derived endothelial cells, BM-early erythroid cells, or peripheral blood (PB) dendritic cells, PB-B cells, PB-T cells, PB-monocytes, or PB-NK cells.
    This series of array data is also available as NCBI-GEO series record GSE1133; which is split into 3 curated GDS (GEO Dataset) records: GDS592 covers the GNF1M data (mouse atlas); GDS594 covers the human GNF1H data; and GDS596 covers the human U133A data. The GNF1H array is described as platform record GPL1074; the GNF1M array is described as GPL1073.
   
    Note
that both GNF Atlas 1+2 expression data are also integrated in the UCSC Gene Sorter, allowing a very convenient "quick-view" of the expression profile of your gene of interest (see above) !  

   Tip! GeneNote is a database of human genes and their expression profiles in healthy tissues. It is based on Weizmann Institute of Science DNA array experiments, which were performed on the Affymetrix HG-U95 set A-E (the same arrays like in the GNF database). GeneNote is tightly connected to the Weizmann database GeneCards, an "integrated" database of human genes, their products and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol, as well as selected others. Actually, GeneNote not only stores microarray data of "normal tissues", but also SAGE data, as well as data based on ESTs (named "Electronic Northern"), as described on the "Methods" page. You may search GeneNote by diverse identifiers like gene symbol, Ensembl ID, UniGene ID, SAGE tag, or LocusLink ID. Interestingly, you may also choose between MAS5.0 (Affymetrix software) normalized or raw data. You will retrieve data for your gene of interest from all three types of methods mentioned.
    Please note that the GeneNote data is also available as NCBI-GEO dataset GDS422 to GDS426 (representing the different U95 subtypes), and as GEO series record GSE803. Please also refer to FAQs EXP3 and EXP6 for additional information. You may selectively search this GDS record for your gene of interest, using ENTREZ GEO Expressions, by entering e.g. "Interleukin 8 AND GDS422". Note that sometimes, gene names (like "TNF") yield heterogenous lists of "similar" hits, in this case you either have to scan for the specific entries or you first look for the array-specific IDs (like ProbeSets for Affymetrix chips) of your gene (described in FAQ EXP2), and then use these IDs for your GEO search instead of the gene name.

    Tip! H-Invitational Database (H-InvDB) is a human gene database opened to the public in April 2004, which is hosted by the Japan Biological Information Research Center (JBIRC) and by the DNA Databank of Japan (DDBJ), with contributions from more than 40 institutes worldwide, like the german DKFZ. The scope of H-InvDB is to provide an integrative annotation of full-length cDNA clones available from high throughput cDNA sequencing projects. The database generates cDNA clusters describing their gene structures, and, among many other features, showing data on gene expression profiling. You first have to get the specific database entry of your gene of interest, either via BLAST (sequence) search or via keyword search, and then look for the section "Gene expression profile" within the so-called "Locus view". The colored symbol links to the database H-ANGEL. H-ANGEL is a viewer of gene expression data incorporated into the database Human Anatomic Gene Expression Library (H-ANGEL), in which we can see the expression data from several experimental platforms (described in the H-InvDB manual) and descriptions about expression from public data resources. Gene expression data in H-ANGEL were generated from three types of methods and in seven different platforms, including iAFLP, a PCR-based quantitative expression profiling method, DNA arrays and cDNA sequence tags (SAGE, EST and MPSS). Note that H-ANGEL contains many tissues but not individual cell lines. You may also query H-ANGEL using different identifiers like H-Inv Locus ID, RefSeq, UniGene ID, or Locus Link ID. Please note that a very nice feature of H-ANGEL is the colored representation of expression data from all the different methods within one single graphical display, allowing a very easy comparison between the different methods ! Please also refer to the H-InvDB section at the Data Integration page for additional information !

    Normal tissues of diverse types (Stanford University, Jan. 2005) is a dataset which contains the expression profiling of a series of normal human tissues. Samples obtained by surgery or autopsy, and evaluated by pathologists. Results provide insight into the molecular organization of diverse cell types, and provide a baseline for comparison to diseased tissues. These data are split into 4 different GEO datasets, which are collectively described in GEO series record GSE2193: SHBW: GDS1085; SHCN: GDS1086; SHBA: GDS1087; SHDP: GDS1088. NOTE: These datasets not only cover different tissues but are based on different platforms, which are non-commercial arrays of spotted cDNAs. NOTE: These expression data are also integrated in the UCSC Gene Sorter, listed as "Expression (Stanford)" in the dropdown menu "sort by", allowing a very convenient "quick-view" of the expression profile of your gene of interest !!!
   
2. SAGE data:

    SAGE offers the expression profile comparison of a multitude of human and mouse cancer and non-cancer cell lines via nucleotide tags; Essentially, the SAGE technique measures not the expression level of a gene, but quantifies a "tag" which represents the transcription product of a gene. A tag, for the purposes of SAGE, is a nucleotide sequence of a defined length, directly 3'-adjacent to the 3'-most restriction site for a particular restriction enzyme. As originally described, the length of the tag was nine bases, and the restriction enzyme NlaIII. Current SAGE protocols produce a ten to eleven base tag, and, although NlaIII remains the most widely used restriction enzyme, enzyme substitutions are possible. The data product of the SAGE technique is a list of tags, with their corresponding count values, and thus is a digital representation of cellular gene expression.
    The SAGEmap Virtual Northern extracts SAGE tags and orientation signals from an input-sequence and displays output links to expression values in different cell lines. Alternatively, you may also query via SAGE tag or via gene name. When you click onto a specific tag in the output, a "virtual Northern" is displayed showing the expression levels in a multitude of cell types just like "bands" on a Northern blot. Please note that often tags are NOT SPECIFIC for one gene / UniGene cluster, so you should always check how many mRNA-source sequences are supporting one tag, and which UniGene clusters these mRNAs belong to !

    Tip! H-Invitational Database (H-InvDB) (see description above) provides expression data generated from all three types of methods (Microarrays, SAGE, and ESTs).

    Tip! ECgene (gene prediction by EST clustering) predicts genes by combining genome-based EST clustering and a transcript assembly procedure. There is a specific section of ECgene called ECexpression, which is the expression data viewer of ECgene, but you may also start at the ECgene homepage and retrieve the Gene Summary Viewer which also displays EST and SAGE data across multiple tissues and cell types. ECexpression utilizes the extensive expression data from EST and SAGE sources. Note that a very useful feature is that normal and cancer libraries are divided and also displayed separately in the graphs. Therefore, this layout makes it easy to find any tissue-specific or cancer-specific isoforms ! Please refer to the main sections of ECgene and ECexpression for details !

    Tip! SAGE Genie is a website of the CGAP portal which provides highly intuitive, visual displays of human and mouse gene expression, based on a unique analytical process that reliably matches SAGE tags, 10 or 17 nucleotides in length, to known genes.
    SAV (SAGE Anatomic Viewer) is one of the tools of SAGE Genie. SAV displays gene expression in human normal and malignant tissues by shading each organ in one of ten colors, each representing a different level of gene expression. Gene expression levels are based on the analysis of counts of SAGE tags, which are either "short" (10 bp), including "extracted short" (10 bp extracted from 17bp tag), or "long" (17 bp).
    SAV can be used to find the best tag for a gene / accesion number: NOTE: SAV is an excellent resource to examine the reliability of individual SAGE tags, meaning the probability that a tag is "unique" or that it matches more than one gene (and thereby renders expression data analysis highly difficult). The best tags are color coded. In addition, the LTV (Ludwig Transcript Viewer) display, showing shorter alternative polyadenylated and internally primed transcripts, supports the prediction of reliable SAGE tags. The tag link enables the user to see which other gene(s) may be represented by the particular tag and the reliability of each mapping. The Digital Northern (DN) display shows the expression of a particular gene (SAGE tag count) per individual SAGE library as color coded tag count. The SAGE Anatomic Viewer itself displays the SAGE tag expression count as colored organ images which are hyperlinked to a Digital Northern displaying the tag expression in each individual library.

3. EST data:

    DigiNorthern, provided by the Bioinformatics Group of the Roswell Park Cancer Institute, is a tool for virtually displaying the expression profile of query genes (currently only accept DNA sequence as input) based on the EST sequences currently available at NCBI GenBank. Note that this is a completely different approach than using microarray data. In this case, expression is "measured" as the tissue distribution of EST sequences corresponding to a certain gene, making it a "rougher" method. Nevertheless, it can be quite interesting to especially see the differences of expression between normal and cancer tissue.
    There are currently two versions for this program. DN1 takes one sequence as query gene and lists all the cell lines/tissues/organs that express the gene and displays the relative expression levels of the gene based on the number of matched ESTs vs the total number of ESTs for related libraries. Whereever available, comparison will also be made between the same tissue/organ in normal and neoplasis status. DN2 takes two sequences as query genes and compares their expression profiles side by side. DigiNorthern is currently available for Human and mouse.

    Please note that DigiNorthern is somehow similar to the SOURCE database display of genes, which provides "UniGene and EST expression information" in the lower part of the page.

    Tip! H-Invitational Database (H-InvDB) (see description above) provides expression data generated from all three types of methods (Microarrays, SAGE, and ESTs).

    Tip! ECgene (gene prediction by EST clustering) predicts genes by combining genome-based EST clustering and a transcript assembly procedure. There is a specific section of ECgene called ECexpression, which is the expression data viewer of ECgene, but you may also start at the ECgene homepage and retrieve the Gene Summary Viewer which also displays EST and SAGE data across multiple tissues and cell types. ECexpression utilizes the extensive expression data from EST and SAGE sources. Note that a very useful feature is that normal and cancer libraries are divided and also displayed separately in the graphs. Therefore, this layout makes it easy to find any tissue-specific or cancer-specific isoforms ! Please refer to the main sections of ECgene and ECexpression for details !

4. RNA in situ expression images:
   
    Please refer to FAQ EXP20 for this purpose.

5. RT-PCR, Northern Blot, Western Blot data:

    Please refer to FAQ EXP21 for this purpose.

6. Protein microscopy data based on immunostains:

    Please refer to FAQ PROT6 for this purpose.
              
Main Index  FAQ Index     
     

     
EXP12...know which genes are expressed in fetal brain 6 times higher than in adult brain ? (last update Aug. 20, 2004)

    This question is an example for all related questions, like "know which genes are expressed in heart (dendritic cells, monocytes...) 10 times higher than in kidney (T-cells, B-cells...)". In principle, all 3 kinds of methods (microarrays, SAGE, ESTs) may be used for this purpose, but in fact the first one yields the best results.

    If we are interested in a specific tissue or cell type, many of them are contained in large "whole transcriptome" datasets, which are described in EXP11. For example, a comparison between expression profiles of fetal and adult brain is possible in datasets like the GNF Atlas version 1 (GEO dataset GDS181) a large-scale analysis of the gene expression profiles from a diverse array of human tissues, organs, and cell lines, from the normal physiological state, using Affymetrix U95 chips. When you open the GDS record, you have the option to compare 2 sets (A vs B) of cell types or tissues, in order to produce lists of genes which are x-times higher expressed in one compared to the other. You simply select the appropriate check-boxes and choose a value for x. When hitting the "QueryAvsB" button, the output is generated. 
    Note that in March 2004, the Version 2 of GNF Atlas was released (see also EXP11), which was not only created using the Affymetrix chips U133A (human), but also using data derived from Novartis - designed arrays (human = GNF1H; mouse = GNF1M), using a panel of RNAs derived from 79 human and 61 mouse tissues or cell types. Using the link "Search Expression", you are able to choose a single tissue/cell type and see ONLY the genes showing a pronounced expression. When choosing the "Search Type" "Correlation Search", you may list genes showing high expression in e.g. B-cells and low expression in dendritic cells or T-cells. This series of array data is also available as NCBI-GEO series record GSE1133; which is split into 3 curated GDS (GEO Dataset) records: GDS592 covers the GNF1M data (mouse atlas); GDS594 covers the human GNF1H data; and GDS596 covers the human U133A data. Using these GDS records, it is possible to perform these "compare 2 sets (A vs B)" analyses, as described for the version 1 above. The GNF1H array is described as platform record GPL1074; the GNF1M array is described as GPL1073.        
    Note
that both GNF Atlas 1+2 expression data are also integrated in the UCSC Gene Sorter, allowing a very convenient "quick-view" of the expression profile of your gene of interest, but not (yet) providing the option to compare whole expression profiles between tissues. 
     
Main Index  FAQ Index     
    

             
EXP13
...
compare two different microarray platforms and see which genes are represented on both of them ? (last update Aug. 31, 2005)

    Tip! CleanEx, provided by the Swiss Institute of Bioinformatics (SIB), is a database which provides access to public gene expression data via unique approved gene symbols and which represents heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-dataset comparisons. So far, CleanEx contains only human genes for which the symbol is approved by the HUGO nomenclature committee. There is one entry per gene name. Thus, CleanEx is NOT a repository of expression data in the strict sense but it collects expression data from several resources (GEO, ArrayExpress, SMD, etc.) in a gene-centered way in order to make it available via one common interface.
    In order to address this specific question, there is a query option called Find common genes in different datasets. This option lets you easily compare the content of 2 different microarray (or other) platforms. The comparison is only available between experiments from one single organism. A table is produced listing ALL genes and their platform-specific identifiers which overlap between 2 different platforms. Note that only the short forms of gene names are listed but not the full descriptions.
        
Main Index  FAQ Index     
      

             
EXP14
...
compare a list of upregulated genes from one microarray experiment with one or more other experiments ? (last update Aug. 31, 2005)

    Tip! CleanEx, provided by the Swiss Institute of Bioinformatics (SIB), is a database which provides access to public gene expression data via unique approved gene symbols and which represents heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-dataset comparisons. So far, CleanEx contains only human genes for which the symbol is approved by the HUGO nomenclature committee. There is one entry per gene name. Thus, CleanEx is NOT a repository of expression data in the strict sense but it collects expression data from several resources (GEO, ArrayExpress, SMD, etc.) in a gene-centered way in order to make it available via one common interface.
    In order to address this specific question, there is a query option called Step-by-step analysis. This is an extremely interesting feature of CleanEx as it allows the successive comparison of the results of different expression experiments. Example: You may first retrieve all genes which are immediately upregulated by the inflammatory cytokine IL-1 in HUVEC (GDS649), and then use only this subset of genes to analyze which of them are also upregulated in human fibroblasts by UV radiation-induced DNA damage (GDS400). First, select the first dataset (GDS649). Once in the selected dataset's form, select the two experiments pools that you want to compare, like overexpression at time point 0.5h compared to 0h, and submit your job. On the result page, you will have two choices. You can either select some genes and extract the corresponding genomic sequences, like potential promoter sequences ("upstream of TSS"). Or you can select another microarray dataset (GDS400) which you want to analyze using the gene subset derived from the first step. In the list, you will see the number of overlapping genes from the first step with each of the other datasets. Again, you have to choose a certain comparison, like overexpression at high doses of UV as compared to the control. If you want to, you may now even add another dataset for comparison. Naturally, the gene set in the output list will get smaller after each round of analysis. NOTE: CleanEx does not seem to consider the "Absent" and "Present" detection calls of Affymetrix, meaning that "absent" genes are not filtered out.
        
Main Index  FAQ Index     
      

      
EXP15
...
know if alternative transcripts of a gene of interest are expressed in different tissues or in different diseases ? (last update Sep. 16, 2005)

    This question demands the availablility of resources which partition the expression data of a specific gene according to its different splice variants. There is a whole list of databases which all store information on alternative splicing, but only few of them provide direct links to associated expression data. In general, we may again separate resources in the field of microarray technology and resources storing EST and SAGE-related expression data.

1. Microarray data:

1.1. Identification of the splice variant represented on the chip:

    NOTE that the general question how to identify microarray platforms containing a specific gene is treated in FAQ EXP2 !!!

    In general, 2 main types of nucleotide-based microarrays exist, those which have cDNAs and those which have a set of oligonucleotides spotted on the chips. Normally, it is not obvious which of the different splice variants of a gene is represented by a certain molecule on a chip. Let us take the human gene IKIP as an example, which has 4 exons occuring in 3 different splice forms. We may query the CleanEx Target database using this gene name. A list of microarray targets is presented, containing several Affymetrix ProbeSets and cDNA clones (e.g. IMAGE IDs). In addition, we may simply query the NCBI GEO Gene profiles using this gene name which will also retrieve cDNAs and oligonucleotide sets spotted on arrays.

    In the case of ESTs, we have to consider a specific problem. EST sequences in databases usually do not cover the whole sequence of a certain clone but only a part of it. Thus, if a certain EST is spotted on a microarray, it may not be clear which transcript it corresponds to. In general, the EST (or other mRNA) sequence can be fetched by using NCBI Entrez Nucleotide using the accession number. Then, you may perform a BLAT search at UCSC against the human genome in order to locate the region (and the splice variant) where this clone corresponds to.

    In the case of oligonucleotide sets (e.g. Affymetrix U133 ProbeSets), the situation is a little more complex. In the CleanEx list, we can see 3 different Affymetrix ProbeSets for the HG-U133B chip type. Affymetrix oligos usually are designed from a so-called target sequence within the 3' regions of mRNAs (3'UTRs and/or 3'coding regions). Thus, in order to know which transcript a ProbeSet corresponds to, we have to extract this target sequence. Clicking onto the CleanEx link for U133B:227295_at reveals the list of oligos and the positions within the targeted sequences. Unfortunately, there is no "contig" of the complete target sequence and there is no graph which immediately shows the exon where the target sequence is taken from. For this purpose, a "Quick Query" at the Affymetrix NetAffx analysis center is needed (free registration required), which retrieves the record for the ProbeSet 227295_at. Here, not only the individual oligonucleotides but also the complete target sequence is shown. You may now simply COPY/PASTE this sequence and perform
a BLAT search at UCSC against the human genome in order to locate this region to the last of the 4 exons of the IKIP gene. NOTE: This region is NOT exactly the same as the one which is shown for this ProbeSet by the UCSC browser when setting the option "Affy U133" to "full" (indeed, this entry would cover both exon 1 and exon 4)!!! So, if you want to precisely locate the target region, it seems to be necessary to use the NetAffx portal.
    The other 2 ProbeSets (CleanEx 235202_x_at and 236249_at); NetAffx 235202_x_at and 236249_at) can be treated in the same way, demonstrating that these datasets correspond to exon 3 of the 4 exons of the IKIP gene.


1.2. Analysis of splice variant-specific expression data:

    Now that we know the particular Affymetrix ProbeSets which correspond to the different splice variants of the IKIP gene, we may specifically search for associated expression data and determine which of the different resources are suitable for this purpose.

    The UCSC Gene Sorter, although an excellent resource for a quick-view of "virtual Multiple Tissue Northern", is not suitable to decipher transcript specific expression patterns. A test using the 3 different ProbeSets of our example showed that all of them directed the query to the pattern of ProbeSet 235202_x_at.

    In contrast, the UCSC Genome Browser, which also integrates expression data, only shows the red/green images for the ProbeSet 227295_at (you have to set the option "GNF Atlas 2" to "full" for this purpose). There is no expression data for the other 2 ProbeSets.

    Tip! GEO (Gene Expression Omnibus) is probably the best resource to search for ProbeSet-specific and therefore in this example transcript-specific expression data. A simple query for GEO Profiles using "235202_x_at" as query reveals a list of experiments containing data for this set. Note that all IKIP-related ProbeSets are spotted on the U133B chip which is used less frequently than the U133A chip. As 2 ProbeSets (235202_x_at and 236249_at) match to the same exon, we would expect similar expression profiles with these 2 IDs, which actually is the case. Nevertheless, the profiles of ProbeSet 227295_at are similar too, not (yet) suggesting transcript-specific expression differences. For a full description of GEO, please refer to the GEO section at the main page.

    NOTE: The CleanEx list of target sets for IKIP also contains several ProbeSets of the U95 series of chips, which all are true IKIP sequences (checked by BLAT). Nevertheless, these ProbeSets are NOT listed in the "U95"-section of the UCSC Genome Browser, when displaying the IKIP genomic region. This is probably due to an inconsistency of annotation freezes.

2. EST and SAGE data:

   
ASD - Alternative Splicing Database, maintained at the EBI, aims to understand the mechanism of alternative splicing on a genome-wide scale by creating a database of alternative splice events and the resultant isoform splice patterns of genes from human, and other model species. Simple queries can be placed at the ASD start site. The Advanced Query page offers a kind of "BioMart-style" interface which also allows to filter the complete dataset to retrieve subsets which are defined by certain features like type of splice event, human-mouse conservation, SNP types, and more. At the gene-specific output pages, there is a section called Splice Pattern Viewer which displays transcripts as interactive graphics with expression state information, links to splice pattern table (with the appropriate splice pattern as high-lighted) along with expression state information and pattern sequence. The Splice Pattern Table lists the number of confirming ESTs for each transcript as well as corresponding source libraries which in turn contain data on tissue, development, and pathology state. Note that ASD is a very good source to link splicing variants with tissue and cell type-specific expression, although these data are available in tabular form only, not as graphs. Please refer also to the ASD main section for details !

    Tip! ECgene (gene prediction by EST clustering) predicts genes by combining genome-based EST clustering and a transcript assembly procedure. There is a specific section of ECgene called ECexpression, which is the expression data viewer of ECgene, but you may also start at the ECgene homepage and retrieve the Gene Summary Viewer which also displays EST and SAGE data across multiple tissues and cell types. ECexpression utilizes the extensive expression data from EST and SAGE sources. Note that a very useful feature is that normal and cancer libraries are divided and also displayed separately in the graphs. Therefore, this layout makes it easy to find any tissue-specific or cancer-specific isoforms ! Gene Summary includes a graphical image of the transcripts along with a "reliability rating" (from A to C), and the links to the different ECgene transcript IDs. This table is especually useful as it shows the total lengths of the transcripts, the sizes of UTRs, the lengths of the CDS and predicted peptide sequences. Transcript Summary is shown by clicking onto one of the transcript IDs in Gene Summary. This page shows the transcript image in detail, including the protein domains and motifs (from Motif/Domain Viewer). Also, more detailed views of the functional annotation and the expression data are shown, now corresponding only to the specific transcript. In partictular, SAGE data are presented as "virtual microarray" (red/green) image. Also, the source libraries of all individual ESTs are listed. Please refer to the main sections of ECgene and ECexpression for details !
                   
Main Index  FAQ Index     
      

      
EXP16
...
generate contig sequences from a set of ESTs while considering alternative splicing  ? (last update Sep. 16, 2005)

    This question addresses a "historical" matter of debate whether it is automatically feasible to separate EST sequences into subsets belonging to different splice products and to generate the respective contig sequences. Although databases like UniGene collect all ESTs belonging to a specific gene, UniGene does not attempt to generate individual contig seuqences of splice products. Nevertheless, there are resources which take a user set of sequences as input and try to generate the corresponding transcripts.

    Tip! ASmodeler is a web-based utility that finds gene models including alternative splicing events from genomic alignment of mRNA, EST and protein sequences. Asmodeler is part of the portal ECgene. Please refer also to the ECgene section at the main page. The user may supply a UniGene cluster ID and /or a set of mRNA, EST, or protein sequences. User-supplied sequences are aligned against the genome map using BLAT and SIM4 programs. Resulting exon connectivity is analyzed by applying graph-theoretic methods to build all possible gene models including splice variants. In addition to the user-supplied sequences, UniGene clusters and many well-known gene predictions such as Genscan, Ensembl, Acembly, may be included in gene modeling. Current implementation supports human, mouse and rat genomes. The output consists of a list of predicted transcripts together with the deduced amino acid sequences. NOTE: Test runs showed that it may be problematic to use FASTA files with long ID lines, as the sample sequence set (which only shows the EST GenBank acc. in the ID line) works well !
                   
Main Index  FAQ Index     
      

            
EXP17...analyze the expression of a gene set of interest in cancer tissues ? -> see GENOM9 !

    As the resources in the field of cancer research are described in section "Genomics", this FAQ is also located at the Genomics FAQ page.
       
Main Index  FAQ Index    
       

            
EXP18...determine the expression profiles of normal vs. cancer tissues ? -> see GENOM10 !

   
As the resources in the field of cancer research are described in section "Genomics", this FAQ is also located at the Genomics FAQ page.
       
Main Index  FAQ Index    
       

       
EXP19
...
determine the reliability of individual SAGE tags for expression analysis of a specific gene ? (last update Feb. 10, 2006)

    This question addresses the general problem when analyzing gene expression via SAGE tag counts that many SAGE tags are not unique, meaning that they match more than one gene. This, of course, renders gene expression analysis highly difficult, as the total SAGE tag count is (or may be) a mixture of the expression of several source genes. One strategy to circumvent this problem was the development of longer SAGE tags, which are therefore more sequence-specific.

    Tip! SAGE Genie is a website of the CGAP portal which provides highly intuitive, visual displays of human and mouse gene expression, based on a unique analytical process that reliably matches SAGE tags, 10 or 17 nucleotides in length, to known genes.
    SAV (SAGE Anatomic Viewer) is one of the tools of SAGE Genie. SAV displays gene expression in human normal and malignant tissues by shading each organ in one of ten colors, each representing a different level of gene expression. Gene expression levels are based on the analysis of counts of SAGE tags, which are either "short" (10 bp), including "extracted short" (10 bp extracted from 17bp tag), or "long" (17 bp).
    SAV can be used to find the best tag for a gene / accesion number: NOTE: SAV is an excellent resource to examine the reliability of individual SAGE tags, meaning the probability that a tag is "unique" or that it matches more than one gene (and thereby renders expression data analysis highly difficult). The best tags are color coded. In addition, the LTV (Ludwig Transcript Viewer) display, showing shorter alternative polyadenylated and internally primed transcripts, supports the prediction of reliable SAGE tags. The tag link enables the user to see which other gene(s) may be represented by the particular tag and the reliability of each mapping.
                                 
Main Index  FAQ Index     
      

       
EXP20
...
get in situ microscopy images of RNA tissue localization and expression intensity ? (last update Mar. 2, 2006)

    This question is somehow related to FAQ PROT6 which described databases storing images of protein localizations in diverse tissues, based on antibody staining. This FAQ here describes databases which store in situ hybridization data which represent both localization and expression rate information of RNA. These databases store microscopic images of such in situ hybridization experiments. Resources which fall into this category are described in main section "RNA Localization Databases".

    Tip! UCSC VisiGene is a browser for viewing in situ images, provided by the UCSC Genome Bioinformatics portal. It enables the user to examine cell-by-cell as well as tissue-by-tissue expression patterns. The browser serves as a virtual microscope, allowing users to retrieve images that meet specific search criteria, then interactively zoom and scroll across the collection. Please refer to the VisiGene section of the User Guide for a list of currently available image collections. NOTE: As such, VisiGene is also a good link page to resources of in situ hybridization data in general !
    Searching VisiGene: The image database may be searched by gene symbols, authors, years of publication, body parts, GenBank or UniProt accessions, organisms, Theiler stages (mice), and Nieuwkoop/Faber stages (frogs).
    Image Navigation and Download: Following a successful search, VisiGene displays a list of thumbnails of images matching the search criteria in the lefthand pane of the browser. By default, the image corresponding to the first thumbnail in the list is displayed in the main image pane. The image may be zoomed in or out, sized to match the resolution of the original image or best fit the image display window, and moved or scrolled in any direction to focus on areas of interest. The original full-sized image may also be downloaded.
addresses the general problem when analyzing gene expression via SAGE tag counts that many SAGE tags are not unique, meaning that they match more than one gene. This, of course, renders gene expression analysis highly difficult, as the total SAGE tag count is (or may be) a mixture of the expression of several source genes. One strategy to circumvent this problem was the development of longer SAGE tags, which are therefore more sequence-specific.
                                     
Main Index  FAQ Index     
      

       
EXP21
...
get RT-PCR, Northern Blot, and Western Blot expression data ? (last update Mar. 3, 2006)

    Until recently, expression data based on laboratory techniques like RT-PCR, Northern Blot, or Western Blot have been accessible only on the level of the individual publication figures. Now, resources are starting to emerge which try to build centralized repositories of such data which can be queried by "simple" gene name searches. As these sites often also store microscopy images of tissue sections, there is a correlation to FAQ EXP20. This question is somehow also related to FAQ PROT6 which described databases storing images of protein localizations in diverse tissues, based on antibody staining.

    Tip! MGI (Mouse Genome Informatics) is maintained at Jackson Laboratory, Maine, USA and collects all data about mouse genes, nomenclature, map positions, individual ESTs, and mouse expression data. The GXD (Gene Expression Database) integrates different types of gene expression information (RNA in situ data, RT-PCR data, Northern Blot, Western Blot) from the mouse and provides a searchable index of published experiments on endogenous gene expression during development. The GXD-Gene Expression page allows to query mouse expression data via different options.
                                     
Main Index  FAQ Index