Bioinformatics World FAQ Center
  FAQ Index -> PATHWAYS, INTERACTIONS, FUNCTIONS
                -> PATH1...predict the pathways, interactions and functions for a gene set of interest ? (last update Jun. 27, 2006)
                -> PATH2...get all proteins from endothelial cells involved in inflammation (SRS and GO approaches)? -> see RET2  
                -> PATH3...predict potential protein-protein interactions of my protein of interest? (last update May 15, 2006)
                -> PATH4...get all human transcription factors involved in inflammation ? (last update Dec. 21, 2005)
                -> PATH5...know which protein domains are present / overrepresented in my gene set of interest ? -> see PROT7  
                -> PATH6...get detailed information on metabolic pathways, reactions, and compounds ? (last update Feb. 1, 2006)
                -> PATH7...download the gene lists of individual pathways for further data analysis ? (last update Apr. 24, 2006)
    
      
Navigate   AtoZ   Search this Site   Site Journal    FAQ Index   Main Index   Appendix       
              

                   
PATH1...predict the pathways, interactions and functions for a gene set of interest ? (last update Jun. 27, 2006)
        
1. Resources based on pathway maps:

    KEGG (Kyoto Encyclopedia of Genes and Genomes) is probably the best-known database for pathway information. KEGG turns sequence information from a number of organisms into metabolic or regulatory pathways. This site makes it easy to place genes into a functional context, and to look for as yet unknown genes that might exist in an organism. A good site to start from is the KEGG table of contents. All components of the maps are clickable leading to detailed information. Please refer also to the KEGG section at the main page.
    Tip! PATHWAY-database of KEGG is a graphical catalogue of metabolic (like glycolysis or ATP synthesis) and regulatory pathways (like apoptosis or cell cycle), which can be simply browsed by topics. If available, different organisms are compared. All components of the maps are clickable leading to detailed information. The page Search Objects in KEGG Pathways allows to perform in-batch searches of gene lists against the KEGG Pathway database. Example: A cluster of gene names from a microarray experiment can be analyzed to display all pathways which are involved and to graphically highlight the input genes within the pathway maps. As query, different identifiers (gene names, EC accessions, KEGG gene identifiers, and more) may be used. The output shows the list of pathways and the corresponding genes, but there is no statistical summary how many genes of your input list are found in which pathway. Note that if a gene is not found in this search, it might still be present in the KEGG Genes database, but it is not assigned to a pathway (yet) ! Note that if you want to perform such a search while including the KEGG annotation "vocabulary" (KO), please refer to the description of KAAS below.

    Reactome (formerly known as "Genome KnowledgeBase) is a collaboration among Cold Spring Harbor Laboratory, The European Bioinformatics Institute, and The Gene Ontology Consortium to develop a curated resource of core pathways and reactions in human biology. The information in this database is authored by biological researchers with expertise in their field, maintained by editorial staff, and cross-referenced with PubMed, GO, and the sequence databases at NCBI, Ensembl and UniProt. You can search/browse by biological processes like DNA replication, translation, glucose metabolism etc., and you will obtain a hierarchical structure of topics and subtopics, including literature and sequence links.    
    Tip! SkyPainter is a program which is part of Reactome, which can (at least by definition) be used to graphically visualize "over-represented" pathways which are turned on by a given set of genes. For this purpose, the gene list is pasted using one of several types if identifiers. Please refer to the main section of SkyPainter for detailed information!
    Some personal remarks:
    Note
that gene names have to be used in the format "ATF3_HUMAN", whereas "ATF3" alone will NOT find any hits.
    There is a strong heterogeneity concerning the display of individual pathways, some images are from publications, some are newly produced for Reactome, and some pathways do not contain any diagramy at all.
    Note
that test runs using a gene set derived from a microarray experiment showed that only a fraction of the genes was represented in the database, and therefore also some specific pathways were highlighted.
    Conclusion: By comparison, the KEGG KAAS tool yielded the more comprehensive results, although SkyPainter offers some very fancy options like the "movie" option for e.g. microarray time course experiments.


    BioCarta is a company which develops reagents and assays for biopharmaceutical research. The BioCarta website serves as interactive web-based resource for life scientists. This information falls into four categories – gene function, proteomic pathways, ePosters, and research reagents.
    The BioCarta Pathways section includes expert-curated interactive graphic models of many  pathways from diverse fields like apoptosis, cell cycle, cell signalling, development, immunology, neuroscience, adhesion, and metabolism. There is a keyword search option for pathway names, or gene names. You may also perform a multi-gene search limiting the output to pathways including all of the query genes. NOTE that clicking onto individual genes in a pathway map reveals a table comprising all important links like LocusLink, UniGene, KEGG, OMIM, GeneCard, PubMed and more.
    NOTE: Although BioCarta offers a "Multi-Gene Search" option, tests showed that it is NOT suitable to analyze large gene datasets for common pathways.                
             

2. Resources based on ontology assignments:

    Ontologies are controlled vocabularies which should be usable across different species to describe the function of proteins. The best known example is GO - Gene Ontology, which is widely used to annotate genes / proteins in terms of cellular compartment (like "nucleus"), molecular function (like "transcription factor"), and biological process (like "inflammatory process"). Each GO term has a stable GO accession number. Thus, it may also be possible to delineate biological processes which are turned on in a certain experimental condition whern analyzing a gene cluster for the presence of "over-represented" GO terms.

    Tip! GOTM (GOTree Machine) is a web-based platform for interpreting microarray data or other interesting gene sets using Gene Ontology. Features include a user friendly web-based interface, an expandable tree for browsing the GO hierarchy, fixed tree as HTML output for archive, bar chart for publication, a statistic analysis indicating GO terms with relatively enriched gene numbers and suggesting biological areas that warrant further study, and finally retrieving subsets of genes by GO term or keyword searching. In conclusion, GOTM is an excellent resource for GO based data mining as it presents data in various forms, both allowing quantitative analysis (like using the Bar Chart or Tree View) as well as qualitative analysis (like using the DAG View to produce a very good, concise overview of the whole dataset). Please refer to the GOTM main section for detailed information !           
          
    BioMart, a data retrieval tool, can also be used to functionally annotate a large gene dataset via GO (Gene Ontology) terms. Please refer also to the BioMart section at the main page for information. You may, as example, paste a list of Entrez Gene IDs corresponding to the genes which are selected from a microarray experiment, into the field "ID list limit" at the "Filter" page of BioMart. Then, at the "Features" page of the output, you may choose to selectively display the GO-related data in the final result table: GO ID, GO description, and GO evidence code. You can choose between different output formats: Text, html, or MS EXCEL. Important: In contrast to KAAS, each gene is listed separately in the result file, listing all associated GO terms, BUT there is NO examination of "common over-represented" terms which are involved.

    AmiGO, the GO browser of the Gene Ontology Consortium, can not only be searched by terms to display GO hierarchies (like QuickGO) but also using a list of gene names, meaning you can directly look for your genes of interest and display their assigned GO terms. For this purpose, you should select the "advanced query", paste your list of gene names, select "gene products"and the species used. Note: Tests showed that it is quite tricky to choose the best way to query, as gene names like "GEM" not only pick the specific gene but also others which contain "GEM" in their name like "GEMI4". Otherwise, when selecting the option "Exact match", there is no hit at all, as we would have to use "GEM_HUMAN" for this purpose. Thus, it is quite hard to retrieve only the desired genes without having to manually check the whole relusts list. Important: In contrast to KAAS, each gene is listed separately in the result file, listing all associated GO terms, BUT there is NO examination of "common over-represented" terms which are involved.
               

3. Resources based on pathway maps and on ontology assignments:

    Tip! The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System was designed to classify proteins (and their genes) in order to facilitate high-throughput analysis. Proteins have been classified according to families and subfamilies, molecular functions, biological processes, pathways. The high-throughput analysis tools suitable for gene datasets are based on a PANTHER-specific gene ontology as on pathway maps. There are several tools which build the PANTHER portal. Please refer to the PANTHER main section for detailed information !
    The Gene Expression Data Analysis tool allows in-depth analyses of gene datasets concerning over-representation of specific PANTHER Ontology terms or pathways. It is somehow similar to the DAVID-Functional annotation/classification tools. The pathway visualization tool will display your experimental results on detailed diagrams of the relationships between genes/proteins in known pathways. The option "Compare gene lists" maps lists of genes to a PANTHER ontology. For pathways, you can then view the gene expression values overlaid on top of a pathway diagram, where genes will be colored differently for different clusters of genes. Use the binomial statistics tool to compare classifications of multiple clusters of lists to a reference list to statistically determine over- or under- representation of PANTHER classification categories. Please refer to the PANTHER main section for detailed information !
            
    KEGG (Kyoto Encyclopedia of Genes and Genomes) is probably the best-known database for pathway information. KEGG turns sequence information from a number of organisms into metabolic or regulatory pathways. This site makes it easy to place genes into a functional context, and to look for as yet unknown genes that might exist in an organism. A good site to start from is the KEGG table of contents. All components of the maps are clickable leading to detailed information. Please refer also to the KEGG section at the main page.
    Tip! KAAS - KEGG Automatic Annotation Server is a very nice tool if you have a large datset, like a set of genes from a microarray experiment, and you would like to know which "functional terms" these genes are assigned to and which pathways are preferentially involved. KAAS provides functional annotation of genes by BLAST comparisons against the manually curated KEGG GENES database. The result contains KO (KEGG Orthology) assignments (similar to GO) and automatically generated KEGG pathways. KAAS accepts a multi-FASTA sequence set (default: protein) as input. NOTE: If you have a gene list, you first have to convert this list into a FASTA-sequence file, you may use tools like BioMart for this purpose. Please refer to the BioMart section at the main page or FAQ RET3 for details.
    KO assignments are based on results from BLASTP. Check the "Nucleotide" checkbox if queries are nucleotide sequences representing a set of EST contigs or ESTs. In this case, KO assignments are based on results from BLASTX and TBLASTN. KO assignment methods may be performed based on the bi-directional best hit (BBH, default) of BLAST or single-directional best hit (SBH). The computation time of the BBH method is about twice that of SBH. However, the method based on BBH will be more accurate than SBH, if the number of query sequences is large enough (genome scale). If the number of query sequences is small, then the SBH method should suffice (and save time).
    The URL to access the results is sent by Email. The results are available in several formats.The "KO list" presents a list of the input sequences, the "KO hierarchy" lists the different metabolic and regulatory pathways together with the genes of the user-dataset which play a role in this specific pathway. "Pathway map" first indicates how many genes of your dataset correspond to a certain pathway and then presents graphical images of the pathways, highlighting the user-submitted genes ! Note: This is a very nice tool to identify pathways which are induced in a certain gene set, which may e.g. be derived from a microarray study analyzing a certain biological stimulus. "Download" produces a simple list of the input dataset together with the assigned "K" numbers.

    CGAP - Cancer Genome Anatomy Project is an NCBI resource which offers a comprehensive molecular characterization of normal, precancerous, and malignant cells. It contains genomic data for humans and mouse, including transcript sequence, gene expression patterns, SNPs, clone resources, and cytogenetic information. Please refer also to the CGAP main section for details.
    In order to address this specific question, CGAP provides at the Gene Finder page the option to use the Batch Gene Finder. In order to use the Batch Gene Finder, prepare a text file containing the list of (human OR mouse) gene symbols, UniGene clusters, accession numbers, protein accession number, UniProt (SwissProt) protein accessions, UniProt (SwissProt) protein identifiers (like "ACTB_HUMAN") or Entrez Gene numbers. The text file must list the identifiers in a vertical column, e.g. export a one-column EXECEL sheet in txt (tab-delimited) format. The created gene list displays the query ID, gene name and symbol, and RefSeq accessions, as well as links to the individual Gene Info pages (see CGAP main section for details). In addition, the link "Common View" allows to create a table displaying all GO terms, Pathways (KEGG and Biocarta), motifs, SNPs, and cyto locations for the complete input gene set.
    Note that this is a quick approach to obtain this data but there is no sophisticated statistical analysis of over-representation of terms. The main emphasis of the CGAP site is the analysis of gene expression in a series of cancer tissues.  
         

4. Resources based (primarily) on cocitation networks:

    Resources which fall into this category extract fuctional biological information from a given gene list based on databases which store the results of literature data mining processes. This means that all papers are analyzed on different levels whether they contain 2 or more gene names in the same context.

    Tip! BiblioSphere PathwayEdition is part of the commercial Genomatix suite of products. This program deals with such cocitations of genes on different levels, like 2 genes are cocited within an abstract, or within the same sentence, or within the same sentence together with a "functional term" (regulation, inhibit,...) or via a direct connection via such a functional term. Text from Genomatix webpage: "BiblioSphere PathwayEdition uses the world's largest database of biological networks created from millions of individually modeled relationships between genes, proteins, complexes, cells and tissues. A unique combination of hand curation from biological experts and up to date text mining techniques for automated knowledge extraction provides you with the best data quality available. BiblioSphere PathwayEdition allows a view on your data, integrated in biological networks according to different biological context."
    BiblioSphere also provides links to other resources like GO (Gene Ontology) or BIND. BiblioSphere produces graphical maps which display the relationships between the genes of your "genes of interest" dataset, like gene citations in PubMed, Gene-Gene co-citations, or transcription factor citations.
    Register for a free evaluation account to get access to Genomatix products. NOTE that you only have 20 free analyses per month !!! Note that there is not only a limitation in the number of analyses but also in the functionality of the obtained data ! Please refer to the main section of BiblioSphere for further details !


5. Resources based (primarily) on interaction networks:
    
    Tip! HiMAP is a dynamic browser for the human protein-protein interaction map, provided by the University of Michigan. Because of this definition, the main section of HiMAP is located under "Pathways and Interactions Databases". BUT: HiMAP actually inludes a wide range of "Interaction types" beyond real protein-protein interaction, like relation based on co-expression (based on co-citations), enriched domain pairs (based on InterPro protein domains), or shared biological process (based on GO terms). NOTE that Although HiMAP does not present statistical values and does not create annotation tables for gene sets, it is well suited and highly recommendable if you want to generate a quick overview about the potential "relationships" within your gene set of interest !
    Please refer to
the main section of HiMAP for details !
           

6. Resources based on several systems of functional assignment:
                        
    Resources which fall into this category predict the biological processes of gene datasets by integrating several databases of functional information, like GO, pathways, cocitation networks, or even protein domain and expression information.

    Tip! WebGestalt is a "WEB-based GEne SeT AnaLysis Toolkit". WebGestalt incorporates information from different public resources and provides an easy way for biologists to make sense out of large sets of genes. It enables biologists to manipulate integrated information and find patterns that are not detectable otherwise. WebGestalt is designed for functional genomic, proteomic, and large scale genetic studies from which high-throughput data are continously produced. It currently works from human and mouse. WebGestalt is free for academic use after registration. NOTE: If you have already registered for GOTM, you can use this login ! In general, save and download options are more versatile than in e.g. DAVID. WebGestalt incorporates practically ALL FIELDS of functional annotation: GO, Pathways, Co-citation Networks, and even protein domain data and expression data !!! Taken together, WebGestalt is an excellent resource for functional annotation of gene datasets. Please refer to the WebGestalt main section for details !
                  
    DAVID - The Database for Annotation, Visualization and Integrated Discovery integrates functional genomic annotations with intuitive graphical summaries. DAVID provides a comprehensive set of tools for investigators to visually summarize annotation from large list of genes, including those derived from microarray and proteomic studies. DAVID is provided at NCI-Frederick and was developed to support the bioinformatic needs at the National Institute of Allergy and Infectious Diseases (NIAID). DAVID is composed of several tools for the functional annotation and classification of large gene sets. NOTE: There are no individual URLs for the individual applications. All have to be started from the DAVID main page. There are 2 different approaches for this question in DAVID:
    Tip! Functional Annotation Tool: The scope of this tool is twofold, first to generate an annotation table of a gene set of interest, and second, to determine "over-represented" terms like pathways or GO terms in order to predict the biological processes affected in a specific dataset. For the latter purpose, the option "Export Selected Annotation as Chart" should be chosen. Please refer to the DAVID main section for details !
    Tip! Functional Classification Tool: The Functional Classification Tool provides a rapid means to organize large lists of genes into functionally related groups to help unravel the biological content captured by high throughput technologies. The Functional Classification Tool generates a gene-to-gene similarity matrix based on shared functional annotation using over 75,000 terms from 14 functional annotation sources. Tools are provided to further explore each functional gene cluster including listing of the “consensus terms” shared by the genes in the cluster, display of enriched terms, and heat map visualization of gene-to-term relationships. Please refer to the DAVID main section for details !
                                    
Main Index  FAQ Index   
                             

                     
PATH2...get all proteins from endothelial cells involved in inflammation (SRS and GO approaches)? -> see RET2      

    Note that this question includes a full description of the Gene Ontology (GO) system of functional assignments; therefore a link to this FAQ is indicated here in the section "Pathways, Interactions, Functions". Also note, that approaches to address this question based on expression analysis are discussed in FAQ EXP6.
   
Main Index  FAQ Index   
   

                             
PATH3...predict potential protein-protein interactions of my protein of interest ? (last update May 15, 2006)
                 
    The prediction of protein-protein interactions is in general a non-trivial matter. There are several sources of information which can be considered in this respect, like of course experimental data of verified interactions, literature mining to look for "over-represented" co-citations of genes / proteins, conserved coexpression, or other data from high-throughput experiments, and more.
         
    Tip!  STRING is an EMBL-database of known and predicted protein-protein interactions. The interactions include direct (physical) and indirect (functional) associations; they are derived from four sources: genomic context, high-throughput experiments, coexpression, and previous knowledge (PubMed). STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. You may query using an identifier of your gene / protein of interest, or you may paste a protein sequence. You can choose the prediction methods, the level of confidence, and the number of interactors shown. The output displays a list of potential functional associations (a list of proteins) and the prediction method used in each case. These data can also be displayed in several views, e.g. the "Summary Network" which shows a graph containing all the proteins listed and a color-code of their "relationships". Note that STRING is also included and automatically performed in the "data super-integration" tool Bioinformatic Harvester at EMBL !
    Note that STRING is especially suitable for a first "quick-view" which is in fact strongly based on text mining and homology. If you want to "dig deeper" including protein structures, you may try other sources like BIND. 

    Tip!  Entrez Gene entries of individual genes now also contain a section called "Interactions" listing in a concise manner the interacting proteins known from literature together with the source database, like BIND. This is a very convenient "quick-view" to get known interactants of your protein of interest.

    Tip! BIND (Biomolecular Interaction Network Database) is a database designed to store full descriptions of interactions, molecular complexes and pathways. Interactions between any two molecules composed of proteins, nucleic acids and small molecules are described. The database can be used to study networks of interactions, to map pathways across taxonomic branches and to generate information for kinetic simulations. There are several ways to query the BIND database. BINDBlast lets you BLAST the BIND database using a protein sequence. You will see if your sequence or homologous sequences are contained in the DB. You will get links concerning the interacting protein(s), method (like 2-hybrid screen), abstracts etc. Note that each interaction is described by a unique BIND-interaction ID. Interactions may be visually navigated using a Java applet called "BIND Interaction Viewer". Simply select the viewer from the menu associated with each interaction report. Alternatively, you may also display protein 3D structures associated with interactions, via launching the NCBI Cn3D Viewer. You may also Search or Browse the BIND database. You can perform a simple text query (like a gene name), or search via diverse accession numbers. You may also browse the whole BIND database for described interactions.
    PreBIND
is a data mining tool that helps researchers locate biomolecular interaction information in the scientific literature. You can enter the name or accession number (RefSeq) or a PubMed ID (PMID) of a protein and PreBIND will return a list of papers that talk about the other molecules that interact with that protein. These papers are found using a list of synonyms that the protein is known by so you can find papers that talk about the protein by names other than the one that you entered. The fact that a paper describes interaction information is determined by a supervised learning algorithm called a support vector machine (SVM); for this reason PreBIND may return papers that could not be easily retrieved simply by doing literature searches for keywords such as "interaction". BUT on the other hand, PreBIND also returns hits which are definitely not protein-protein interactions, but are solely based on over-represented "words" in the abstracts. So, always be careful when going through the hit lists !

    Tip! IntAct, developed at the EBI, provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions. IntAct can be queried using diverse types of identifiers, like gene name ("Ptgs2"), IntAct accession number ("EBI-298933"), UniProt acc. ("Q05769"), UniProt ID ("PGH2_MOUSE"), InterPro acc., GO acc., and PubMed ID. The output displays lists of interaction partners, links to PubMed references, experimental techniques to verify interactions, graphical displays of interaction networks, and more. Please refer also to the section IntAct IDs for specific examples and further hints.

    Tip! You may also first query for known protein domains in a protein of interest and then investigate known protein domain - domain interactions. For this purpose, iPfam, a sister database of Pfam, can be used. iPfam is a resource that describes domain-domain interactions that are observed in PDB entries. The domains are defined by Pfam. When two or more domains occur in a single structure, the domains are analysed to see if they form an interaction. If the domains are close enough to form an interaction, the bonds forming the interaction are calculated. More information on how the bonds are calculated can be found in the help section. The interaction information is re-calculated at each Pfam release, so as Pfam changes, the information within iPfam is kept up to date. You can access the information in iPfam from each domain family page, or you can browse by domain interaction. The browse page also allows a search by domain name or accession.

    Tip! If you are mainly interested in human proteins, another very interesting resource is HPRD - Human Protein Reference Database. HPRD represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome. All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data. HPRD is a joint project between Pandey Lab, Johns Hopkins School of Medicine, Baltimore, and the IOB, Institute of Bioinformatics, Bangalore, India. At first sight, HPRD may seem like "yet another protein database", but there are some features which are really worth mentioning. Concerning this specific question, each protein entry has a tab called "Interactions" which displays all known interaction partners of a specific protein. The related PubMed abstracts are linked with each interaction and the type (in vitro, in vivo) is indicated. This is an example of the interaction partners of the protein IKK alpha. NOTE: These HPRD-based interaction data are also available when performing a query at the STRING database, and choosing the view "Experiments" at the output overview; see also STRING section. Please refer to the HPRD section at the Pathways page for a detailed description !

    Tip! HiMAP is a dynamic browser for the human protein-protein interaction map, provided by the University of Michigan. Because of this definition, the main section of HiMAP is located under "Pathways and Interactions Databases". BUT: HiMAP actually inludes a wide range of "Interaction types" beyond real protein-protein interaction, like relation based on co-expression (based on co-citations), enriched domain pairs (based on InterPro protein domains), or shared biological process (based on GO terms).
    A SINGLE gene search
is performed using gene symbol, gene name, Locus Link ID or Unigene ID. In order to draw protein interaction networks, the user may select between different methods of protein-protein interaction determination / prediction, like specific Yeast 2 Hybrid datasets, the HPRD dataset, literature-confirmed interactions, or "pure" predictions. Note: If more than one option is chosen, it is possible to highlight interactions derived from a specific method by selecting the "Highlight edges" checkbox. Note: It is possible to determine the node colors in the graph based on molecular function or cellular localization. The link "Legend" in the graph window explains the different colors, like "kinase", "transcription factor", "receptor" or "nucleus" and "cytoplasm". The interaction network is drawn based on the selected methods. Each "connecting line" (edge) between 2 genes is clickable in order to display an information box displaying the 2 gene names, the evidence type, and associated PubMed references where this "interaction" is described.
    Please refer to the main section of HiMAP for details !

    iHOP - information Hyperlinked Over Proteins generates a network built of co-citations of genes and proteins in public literature. iHOP is a public service provided by the Protein Design Group (PDG), National Center of Biotechnology (CNB), Madrid, Spain. By employing genes and proteins as hyperlinks between sentences and abstracts, iHOP converts the information in PubMed into one navigable resource. Note that protein-protein interactions which are experimentally verified are specifically highlighted. The search for literature information about a particular gene (GENE X) is the starting point in iHOP. You may limit the search to specific fields or to individual organisms. NOTE: iHOP is also included and automatically performed in the "data super-integration" tool Bioinformatic Harvester at EMBL ! Please refer to the iHOP section at the Pathways page for a detailed description !
                          
Main Index  FAQ Index    
                 

                   
PATH4...get all human transcription factors involved in inflammation ? (last update Dec. 21, 2005)
        
    This question is related to FAQ PATH2, which combines tissue specific expression (endothelial cells) with biological process annotation (inflammation) as well as to FAQ RET4 which combines tissue specific expression with molecular function annotation (transcription factor). In FAQ PATH4, we analyze the combined filtering for molecular function (transcription factors), pathway information (inflammation) together with species selection (human).

    Tip! The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System was designed to classify proteins (and their genes) in order to facilitate high-throughput analysis. Proteins have been classified according to families and subfamilies, molecular functions, biological processes, pathways. The high-throughput analysis tools suitable for gene datasets are based on a PANTHER-specific gene ontology as on pathway maps. There are several tools which build the PANTHER portal. Please refer to the PANTHER main section for detailed information !
    The integrated Prowler tool is a very efficient possibility to filter the PANTHER databases according to specific criteria. Example: Molecular Function: Transcription Factor + Pathway: Inflammation mediated by cytokine signaling + Species: NCBI:H.sapiens. This selection will retrieve a list of 17 genes which can be displayed and downloaded (see details under "Batch ID Search" within the PANTHER main section). NOTE that as PANTHER uses a "unique" system of ontology terms, you may get different results when using tools based on the GO (Gene Ontology) vocabularies.
                                       
Main Index  FAQ Index   
                 

                   
PATH5...know which protein domains are present / overrepresented in my gene set of interest ? -> see PROT7  

    This question is related to FAQ PATH1 as it involves similar resources. But as this FAQ is somehow also  the "batch version" of FAQ PROT1, it is contained in the "Proteins" section. It refers to larger datasets, like a cluster of genes from a microarray experiment, where the user wants to extract the protein domains which are involved. In addition, this FAQ lists programs which predict an overrepresentation of protein domains as compared to a reference gene/protein set. 
      
Main Index  FAQ Index   
       

                   
PATH6...get detailed information on metabolic pathways, reactions, and compounds ? (last update Feb. 1, 2006)
        
    In general, there are resources which deal both with metabolic and regulatory (signal transduction) pathways, like KEGG, and there are databases which focus on only one of these fields. In this FAQ, resources are described and compared which store information on metabolic pathways.

    Tip! BioCyc is a collection of 205 Pathway/Genome Databases plus the BioCyc Open Chemical Database. Each Pathway/Genome Database in the BioCyc collection describes the genome and metabolic pathways (NO signaling pathways !) of a single organism, with the exception of the MetaCyc database, which is a reference source on metabolic pathways from many organisms. The BioCyc databases are divided into three tiers, based on their quality (intensive-moderate-no curation). ALL databases (including MetaCyc and HumanCyc) can be queried from the BioCyc Query page. The user may query for pathways, reactions, compounds, genes, and proteins, or browse ontologies or screen through lists of database entries.
    Note that BioCyc contains many more comments which describe individual metabolic pathways than KEGG. KEGG maps are much larger than BioCyc maps, and are mosaics that combine reactions from many organisms, whereas BioCyc maps describe single pathways elucidated in single organisms. Taken together, BioCyc is at least of equal quality in the field of metabolic pathways as compared to KEGG. Please refer also to the BioCyc main section for details.

    KEGG (Kyoto Encyclopedia of Genes and Genomes) is probably the best-known database for pathway information. KEGG turns sequence information from a number of organisms into metabolic or regulatory pathways. This site makes it easy to place genes into a functional context, and to look for as yet unknown genes that might exist in an organism. A good site to start from is the KEGG table of contents. All components of the maps are clickable leading to detailed information.
    Tip! PATHWAY-database of KEGG is a graphical catalogue of metabolic (like glycolysis or ATP synthesis) and regulatory pathways (like apoptosis or cell cycle), which can be simply browsed by topics. If available, different organisms are compared. All components of the maps are clickable leading to detailed information. Please refer also to the KEGG section at the main page.
                                       
Main Index  FAQ Index   
               

                   
PATH7...download the gene lists of individual pathways for further data analysis ? (last update Apr. 24, 2006)
        
    Tip! CGAP - Cancer Genome Anatomy Project is an NCBI resource which offers a comprehensive molecular characterization of normal, precancerous, and malignant cells. It contains genomic data for humans and mouse, including transcript sequence, gene expression patterns, SNPs, clone resources, and cytogenetic information. Informatics tools are provided to query and analyze the data.
    One of the modules of CGAP is CGAP Pathways. Pathways on the CGAP web site have been obtained directly from BioCarta (to create BioCarta Pathways on CGAP) and KEGG (to create KEGG Pathways on CGAP). In addition, CGAP has linked each human gene in BioCarta and each human enzyme in KEGG to its CGAP Gene Info page, and each intermediary metabolite in KEGG to a CGAP Compound Info page.
    Example: The NF-kB signaling pathway from Biocarta. NOTE: The "Genes" link produces a Gene List of all genes seen in this pathway diagram, including all the options for further data analysis as described in the section "CGAP Batch Gene Finder" !!! These options include "Common View" which allows to create a table displaying all GO terms, Pathways (KEGG and Biocarta), motifs, SNPs, and cyto locations for the complete input gene set. Note that common aspects of the listed genes are highlighted. Note that this is a very convenient and quick way to create an annotation table for a gene set of interest containing the most important function-related data, which is per se independent of the expression in cancer situations ! This table can also be saved as tab-delimited text. In addition, the expression of the whole gene set can be viewed as colored graph within the NCI60 panel of cancer cell lines (please refer to the NCI60 section for background). The link "SAGE Summary" displays the SAGE counts of the input gene set in a series of normal and cancer tissues.

        ID Mapping is a tool for ID conversion provided by PIR, a member of the UniProt consortium, which is capable of converting lists of IDs (like GenBank AC, gi, RefSeq, TIGR, Pfam, Prints, PROSITE, KEGG pathway ID, GO, BIND, Gene name, Entrez Gene ID, OMIM, PubMed, and more) to and from UniProtKB ID or AC. Note that this tool can therefore also be used for questions like how to get all proteins belonging to a certain pathway. NOTE: The conversion only works to and from UniProtKB ID or AC, it is not possible to e.g. convert KEGG pathway IDs into Entrez Gene IDs !
                                           
Main Index  FAQ Index