-> PATHWAYS, INTERACTIONS, FUNCTIONS
-> PATH1...predict the
pathways, interactions and functions for a gene set of interest ? (last
update Jun. 27, 2006)
-> PATH2...get all proteins from endothelial cells involved in
inflammation (SRS and GO approaches)? -> see RET2
!
-> PATH3...predict potential protein-protein interactions of my
protein of interest? (last update May 15, 2006)
-> PATH4...get all human transcription factors involved in
inflammation ? (last update Dec. 21, 2005)
-> PATH5...know which protein domains
are present / overrepresented in my gene set of interest ? -> see PROT7
!
-> PATH6...get detailed information on metabolic pathways,
reactions, and compounds ? (last update Feb. 1, 2006)
-> PATH7...download the gene lists of individual pathways for
further data analysis ? (last update Apr. 24, 2006)
PATH1...predict the
pathways, interactions and functions for a gene set of interest ? (last
update Jun. 27, 2006)
1. Resources based on pathway maps:
KEGG
(Kyoto
Encyclopedia
of Genes and Genomes) is probably the best-known database for pathway
information. KEGG turns sequence information from a number of
organisms into metabolic
or regulatory
pathways. This site makes it easy to place genes into a functional
context, and to look for as yet unknown genes that might exist in
an organism. A good site to start from is the KEGG table of contents. All components
of the maps are clickable leading to detailed information. Please refer
also to the KEGG section at the main page.
Tip! PATHWAY-database
of KEGG is a graphical catalogue of metabolic
(like glycolysis or ATP synthesis) and regulatory
pathways (like apoptosis or cell cycle), which can be simply browsed by
topics. If available, different
organisms are compared. All components of the maps are clickable
leading to detailed information. The page Search
Objects in KEGG Pathways allows to perform in-batch
searches of gene lists against the KEGG Pathway database. Example:
A cluster of gene names from a microarray experiment can be analyzed to
display all pathways which are involved and to graphically highlight
the input genes within the pathway maps. As query, different
identifiers (gene names, EC accessions, KEGG gene identifiers, and
more) may be used. The output shows the list of pathways and the
corresponding genes, but there is no statistical summary how many genes
of your input list are found in which pathway. Note that if a
gene is not found in this search, it might still be
present in the KEGG Genes database, but it is not assigned to a pathway
(yet) ! Note that if you want to perform such a search while
including the KEGG annotation "vocabulary" (KO), please refer to the
description of KAAS below.
Reactome
(formerly
known as "Genome KnowledgeBase) is a collaboration among Cold Spring Harbor Laboratory, The European Bioinformatics Institute,
and The Gene Ontology Consortium
to develop a curated resource of core pathways and reactions in
human biology. The information in this database is authored by
biological researchers with expertise in their field, maintained by
editorial staff, and cross-referenced with PubMed,
GO, and the sequence
databases at NCBI, Ensembl and UniProt. You can
search/browse by biological processes like DNA replication,
translation, glucose metabolism etc., and you will obtain a
hierarchical structure of topics and subtopics, including literature
and sequence links.
Tip! SkyPainter
is a program which is part of Reactome, which can (at least by
definition) be used to graphically visualize "over-represented"
pathways which are turned on by a given set of genes. For this
purpose, the gene list is pasted using one of several types if
identifiers. Please refer to the main
section of SkyPainter for
detailed information!
Some personal remarks:
Note
that gene names have to be used in the format "ATF3_HUMAN", whereas
"ATF3" alone will NOT find any hits.
There is a strong heterogeneity concerning the display
of
individual pathways, some images are from publications, some are
newly produced for Reactome, and some pathways do not contain any
diagramy at all.
Note that test runs using a gene set derived
from a microarray experiment showed that only a fraction of the genes
was represented in the database, and therefore also some specific
pathways were highlighted.
Conclusion: By comparison, the KEGG KAAS tool
yielded the more comprehensive results, although SkyPainter offers some
very fancy options like the "movie" option for e.g. microarray time
course experiments.
BioCarta is
a company which develops reagents and assays for biopharmaceutical
research. The BioCarta website serves as interactive web-based
resource for life scientists. This information falls into four
categories – gene function, proteomic pathways, ePosters, and
research reagents.
The BioCarta
Pathways
section includes expert-curated interactive graphic models of
many pathways from
diverse fields like apoptosis, cell cycle, cell signalling,
development, immunology, neuroscience, adhesion, and metabolism. There
is a keyword search option for pathway names, or gene names.
You may also perform a multi-gene search limiting the output to
pathways including all of the query genes. NOTE that clicking
onto individual genes in a pathway map
reveals a table comprising all important links like LocusLink,
UniGene, KEGG, OMIM, GeneCard, PubMed and more.
NOTE: Although BioCarta offers a "Multi-Gene
Search"
option, tests showed that it is NOT suitable to analyze large
gene datasets for common pathways.
2. Resources based on ontology assignments:
Ontologies are controlled vocabularies which
should be usable across different species to describe the function of
proteins. The best known example is GO - Gene Ontology, which
is widely used to annotate genes / proteins in terms of cellular
compartment (like "nucleus"), molecular function (like "transcription
factor"), and biological process (like "inflammatory process"). Each GO
term has a stable GO accession number. Thus, it may also be possible to
delineate biological processes which are turned on in a certain
experimental condition whern analyzing a gene cluster for the presence
of "over-represented" GO terms.
Tip! GOTM (GOTree
Machine) is a web-based platform for interpreting
microarray data or other interesting gene sets using Gene Ontology. Features
include a user friendly web-based interface, an expandable tree for
browsing the GO hierarchy, fixed tree as HTML output for archive, bar
chart for publication, a statistic analysis indicating GO terms with
relatively enriched gene numbers and suggesting biological areas that
warrant further study, and finally retrieving subsets of genes by GO
term or keyword searching. In conclusion, GOTM is an excellent
resource for GO based data mining as it
presents data in various forms, both allowing quantitative analysis
(like using the Bar Chart or Tree View) as well as qualitative
analysis (like using the DAG View to produce a very good, concise
overview of the whole dataset). Please refer to the GOTM main section for detailed
information !
BioMart,
a data retrieval tool, can also
be used to functionally annotate a large gene dataset via GO (Gene
Ontology) terms. Please refer also to the BioMart section at the main page for
information. You may, as example, paste a list of Entrez Gene IDs
corresponding to the genes which are selected from a microarray
experiment, into the field "ID list limit" at the "Filter" page of
BioMart. Then, at the "Features" page of the output, you may choose to
selectively display the GO-related data in the final result table: GO
ID, GO description, and GO evidence code. You can choose between
different output formats: Text, html, or MS EXCEL. Important:
In contrast
to KAAS, each gene is listed separately in the result file,
listing all associated GO terms, BUT there is NO
examination of "common over-represented" terms which are involved.
AmiGO, the GO
browser of the Gene Ontology Consortium, can not only be searched by terms
to display GO hierarchies (like QuickGO) but also using a
list of gene names,
meaning
you can directly look for your genes of interest and display their
assigned
GO terms. For this purpose, you should select the "advanced query",
paste your list of gene names, select "gene products"and the species
used. Note: Tests showed that it is quite tricky to choose the
best way to query, as gene names like "GEM" not only pick the specific
gene but also others which contain "GEM" in their name like "GEMI4".
Otherwise, when selecting the option "Exact match", there is no hit at
all, as we would have to use "GEM_HUMAN" for this purpose. Thus, it is
quite hard to retrieve only the desired genes without having to
manually check the whole relusts list. Important: In contrast
to KAAS, each gene is listed separately in the result file,
listing all associated GO terms, BUT there is NO
examination of "common over-represented" terms which are involved.
3. Resources based on pathway maps and on ontology assignments:
Tip! The
PANTHER (Protein ANalysis
THrough Evolutionary
Relationships) Classification
System was designed to classify proteins (and their genes) in
order to facilitate high-throughput analysis. Proteins have been
classified according to families and subfamilies, molecular functions,
biological processes, pathways. The high-throughput analysis tools
suitable for gene datasets are based on a PANTHER-specific gene
ontology as on pathway maps. There are several tools which
build the PANTHER portal. Please refer to the PANTHER main section for detailed
information !
The Gene
Expression Data Analysis tool allows in-depth analyses of gene
datasets concerning over-representation of specific PANTHER Ontology
terms or pathways. It is somehow similar to the DAVID-Functional
annotation/classification tools. The pathway visualization tool
will display your experimental results on detailed diagrams of the
relationships between genes/proteins in known pathways. The
option "Compare gene lists" maps lists of genes to a PANTHER
ontology. For pathways, you can then view the gene expression values
overlaid on top of a pathway diagram, where genes will be colored
differently for different clusters of genes. Use the binomial
statistics tool to compare classifications of multiple clusters of
lists to a reference list to statistically determine over-
or under-
representation of PANTHER classification categories. Please refer
to the PANTHER main section for
detailed information !
KEGG
(Kyoto Encyclopedia
of Genes and Genomes) is probably the best-known database for pathway
information. KEGG turns sequence information from a number of
organisms into metabolic
or regulatory
pathways. This site makes it easy to place genes into a functional
context, and to look for as yet unknown genes that might exist in
an organism. A good site to start from is the KEGG table of contents. All components
of the maps are clickable leading to detailed information. Please refer
also to the KEGG section at the main page.
Tip!
KAAS - KEGG Automatic
Annotation Server is a very nice tool if you have a large datset,
like a set of genes from a
microarray experiment, and you would like to know which "functional
terms" these genes are assigned to and which pathways
are preferentially involved. KAAS provides functional
annotation of genes by BLAST comparisons against the manually curated
KEGG GENES database.
The result contains KO (KEGG Orthology) assignments (similar to
GO) and automatically
generated KEGG pathways. KAAS accepts a multi-FASTA sequence
set (default: protein) as input. NOTE: If you have a
gene list, you first have to convert this list into a
FASTA-sequence file, you may use tools like BioMart for this
purpose. Please refer to the BioMart
section at the main page or FAQ RET3
for details.
KO assignments are based on
results from BLASTP. Check the "Nucleotide" checkbox if queries are
nucleotide sequences
representing a set of EST contigs or ESTs. In this case, KO assignments
are based on results from BLASTX and TBLASTN. KO assignment methods may
be performed based on the
bi-directional best hit (BBH, default) of BLAST or single-directional
best hit (SBH). The computation time of the BBH method is
about twice that of SBH. However, the method based on BBH will be more
accurate than SBH, if the number of query sequences is large enough
(genome scale). If the number of query sequences is small, then the SBH
method should suffice (and save time).
The URL to access the results is sent by
Email. The results are available in several formats.The "KO list"
presents a list of the input sequences, the "KO hierarchy"
lists the different metabolic and regulatory
pathways together with the genes of the user-dataset which play a role
in this specific pathway. "Pathway map" first indicates how
many genes of your dataset
correspond to a certain pathway and then presents graphical images of
the pathways, highlighting the user-submitted genes ! Note: This
is a very nice tool to identify pathways
which are induced in a certain gene set, which may e.g. be derived from
a microarray study analyzing a certain biological stimulus. "Download"
produces a simple list of the input dataset together with the assigned
"K" numbers.
CGAP
- Cancer Genome
Anatomy Project is an NCBI resource which offers a
comprehensive molecular characterization of normal, precancerous, and
malignant cells. It contains genomic data for humans and mouse,
including transcript sequence, gene expression patterns, SNPs,
clone resources, and cytogenetic information. Please refer also to the CGAP main section for details.
In order to address this specific question, CGAP
provides at the Gene Finder page
the option to use the Batch Gene Finder.
In order to use the Batch Gene Finder, prepare a text file containing
the list of (human OR mouse)
gene symbols, UniGene
clusters, accession numbers, protein accession number, UniProt
(SwissProt) protein accessions, UniProt (SwissProt) protein identifiers
(like "ACTB_HUMAN") or Entrez Gene numbers. The text file must list the
identifiers in a vertical column, e.g. export a one-column EXECEL sheet
in txt (tab-delimited) format. The created gene list displays
the query
ID, gene name and symbol, and RefSeq accessions, as well as links to
the individual Gene Info pages (see CGAP main section for
details). In addition, the link "Common View" allows to create
a table displaying all GO
terms, Pathways (KEGG and Biocarta), motifs, SNPs, and cyto
locations
for the complete input gene set.
Note that this is a quick approach
to obtain this data but there is no sophisticated statistical analysis
of over-representation of terms. The main emphasis of the CGAP site is
the analysis of gene expression in a series of cancer tissues.
4. Resources based (primarily) on cocitation networks:
Resources which fall into this category extract
fuctional biological information from a given gene list based on
databases which store the results of literature data mining
processes. This means that all papers are analyzed on different levels
whether they contain 2 or more gene names in the same context.
Tip! BiblioSphere
PathwayEdition is part of
the commercial Genomatix suite of
products. This program deals with such cocitations of genes on
different levels, like 2 genes are cocited within an abstract,
or within the same sentence, or within the same sentence together with
a "functional term" (regulation, inhibit,...) or via a direct
connection via such a functional term. Text from Genomatix webpage:
"BiblioSphere PathwayEdition uses the
world's largest database of biological networks created from millions
of individually modeled relationships between genes, proteins,
complexes, cells and tissues. A unique combination of hand curation
from biological experts and up to date text mining techniques for
automated knowledge extraction provides you with the best data quality
available. BiblioSphere PathwayEdition allows a view on your data,
integrated in biological networks according to different biological
context."
BiblioSphere also provides links to other
resources like GO (Gene
Ontology) or BIND. BiblioSphere produces
graphical
maps which display the relationships between the genes of your
"genes of interest" dataset, like gene citations in PubMed, Gene-Gene
co-citations, or transcription factor citations.
Register for a free evaluation account
to get access to Genomatix products. NOTE that you only have 20
free analyses
per month !!! Note that there is not
only a limitation
in the number of analyses but also in the functionality
of the obtained data ! Please refer to the main section of BiblioSphere for
further details !
5. Resources based (primarily) on interaction networks:
Tip!
HiMAP is a
dynamic browser for the human protein-protein interaction map,
provided by the University of Michigan. Because of this definition, the
main section of HiMAP is located
under "Pathways
and Interactions
Databases". BUT: HiMAP actually inludes a wide
range of
"Interaction types"
beyond real protein-protein interaction, like relation based on
co-expression (based on co-citations), enriched domain pairs
(based on
InterPro protein domains), or shared biological process (based
on GO terms). NOTE that
Although
HiMAP does not present statistical values and does not create
annotation tables for gene sets, it is well suited and highly
recommendable if you want to generate a quick overview about the
potential "relationships" within your gene set of interest !
Please refer to the main
section of HiMAP for details !
6. Resources based on several systems of functional
assignment:
Resources which fall into this category predict the
biological processes of gene datasets by integrating several
databases of functional information, like GO, pathways, cocitation
networks, or even protein domain and expression information.
Tip! WebGestalt
is a "WEB-based GEne SeT AnaLysis Toolkit". WebGestalt incorporates
information from different public resources and provides an easy way
for biologists to make sense out of large sets of genes. It
enables biologists to manipulate integrated information and find
patterns
that are not detectable otherwise. WebGestalt is designed for
functional genomic, proteomic, and large scale genetic studies from
which high-throughput data are continously produced. It currently works
from human and mouse. WebGestalt is free for academic
use after registration. NOTE: If you have already
registered for GOTM,
you
can use this login ! In general, save and download options are
more versatile than in e.g.
DAVID. WebGestalt incorporates practically ALL FIELDS of functional
annotation: GO, Pathways, Co-citation Networks, and even
protein domain data and expression data !!! Taken together,
WebGestalt is an excellent resource for functional
annotation of gene datasets. Please refer to
the WebGestalt main section for
details !
DAVID
- The
Database for Annotation, Visualization and Integrated
Discovery integrates functional genomic
annotations with intuitive graphical summaries. DAVID provides a
comprehensive set of tools for investigators to visually summarize annotation
from large list of genes, including those derived from
microarray and proteomic studies. DAVID is provided at NCI-Frederick and was developed to
support the bioinformatic needs at the . DAVID is
composed of several tools for the functional annotation and
classification of large gene sets. NOTE: There are no individual
URLs for the individual
applications. All have to be started from the DAVID main page. There
are 2 different approaches for this question in DAVID:
Tip! Functional
Annotation Tool: The scope of this tool is twofold, first to
generate
an annotation table of a gene set of interest, and second, to determine
"over-represented" terms like pathways or GO terms in order to predict
the biological processes affected in a specific dataset. For the latter
purpose, the option "Export Selected Annotation as
Chart" should be chosen. Please refer to the DAVID main section for details !
Tip! Functional
Classification Tool: The Functional Classification
Tool provides a rapid
means to organize large lists of genes into functionally related
groups to help unravel the biological content captured by high
throughput technologies. The Functional Classification
Tool generates a gene-to-gene similarity matrix based on shared
functional annotation using over 75,000 terms from 14 functional
annotation sources. Tools are provided to further
explore each functional gene cluster including listing of the “consensus
terms” shared by the genes in the cluster, display of enriched
terms, and heat map visualization of gene-to-term relationships. Please refer to
the DAVID main section for details !
PATH2...get all proteins from endothelial cells involved in
inflammation (SRS and GO approaches)? -> see RET2
!
Note that this question includes a full
description of the Gene Ontology (GO) system of functional
assignments; therefore a link to this FAQ is indicated here in the
section "Pathways, Interactions, Functions". Also note,
that approaches to address this question based on expression
analysis are discussed in FAQ EXP6.
PATH3...predict potential
protein-protein interactions of my protein of interest ? (last update May 15, 2006)
The prediction of protein-protein interactions
is in general a non-trivial matter. There are several sources of
information which can be considered in this respect, like of course
experimental data of verified interactions, literature mining to look
for "over-represented" co-citations of genes / proteins, conserved
coexpression, or other data from high-throughput experiments, and more.
Tip!
STRING is an
EMBL-database of known and predicted protein-protein interactions.
The interactions include direct (physical) and indirect (functional)
associations; they are derived from four sources: genomic context,
high-throughput experiments,
coexpression, and previous knowledge (PubMed). STRING quantitatively
integrates
interaction data from these sources for a large number of organisms,
and
transfers information between these organisms where applicable. You may
query
using an identifier of your gene / protein of interest, or you may
paste
a protein sequence. You can choose the prediction methods, the level of
confidence,
and the number of interactors shown. The output displays a list
of
potential functional associations (a list of proteins) and the
prediction
method used in each case. These data can also be displayed in several
views,
e.g. the "Summary Network" which shows a graph containing all the
proteins
listed and a color-code of their "relationships". Note that STRING
is also included and automatically performed in the "data
super-integration"
tool Bioinformatic
Harvester
at EMBL !
Note that STRING is especially suitable for
a
first "quick-view" which is in fact strongly based on text
mining
and homology. If you want to "dig deeper" including protein structures,
you
may try other sources like BIND.
Tip!
Entrez
Gene entries of
individual genes now also contain a section called "Interactions"
listing in a concise manner the interacting proteins known from
literature together with the source database, like BIND. This is a very
convenient "quick-view" to get known interactants of your protein of
interest.
Tip! BIND (Biomolecular Interaction
Network Database) is a database designed to store full descriptions
of interactions, molecular complexes and pathways.
Interactions between any two molecules composed of proteins, nucleic
acids and small molecules are described. The database can be used to
study networks of interactions, to map pathways across
taxonomic branches and
to generate information for kinetic simulations. There are several
ways to query the BIND database. BINDBlast lets you BLAST the BIND
database using a protein sequence. You will see if your sequence or
homologous sequences are contained in the DB. You will get links
concerning the interacting protein(s), method (like 2-hybrid screen),
abstracts etc. Note that each interaction is described by a unique
BIND-interaction ID. Interactions may be visually navigated
using a Java applet called "BIND Interaction Viewer". Simply
select the viewer from the menu associated with each interaction
report. Alternatively, you may also display protein 3D structures
associated
with interactions, via launching the NCBI Cn3D Viewer. You may
also
Search or Browse the BIND database.
You can perform a simple text query (like a gene name), or search via
diverse accession numbers. You may also browse the whole BIND database
for described interactions.
PreBIND
is a data mining tool that helps researchers locate biomolecular
interaction information in the scientific literature. You can
enter the name or accession number (RefSeq) or a PubMed
ID (PMID) of a protein and PreBIND will return a list of papers
that talk about the other molecules that interact with that protein.
These papers are found using a list of synonyms that the
protein is known by so you can find papers that talk about the protein
by names other than the one that you entered. The fact that a paper
describes interaction information is determined by a supervised
learning algorithm called a support vector machine (SVM); for this
reason PreBIND may return papers that could not be easily retrieved
simply by doing literature searches for keywords such as "interaction".
BUT on the other
hand, PreBIND also returns hits which are definitely not
protein-protein interactions, but are solely based on over-represented
"words" in the abstracts. So, always be careful when going through the
hit lists !
Tip! IntAct, developed at the
EBI, provides a freely available, open source database system and
analysis tools for protein interaction data. All interactions
are derived from literature curation or direct user submissions. IntAct
can be queried using diverse types of identifiers, like
gene name ("Ptgs2"), IntAct accession number ("EBI-298933"), UniProt
acc. ("Q05769"), UniProt ID ("PGH2_MOUSE"), InterPro acc., GO acc., and
PubMed ID. The output displays lists of interaction partners, links to
PubMed
references, experimental techniques to verify interactions, graphical
displays of interaction networks, and more. Please refer also
to the section IntAct IDs for
specific examples and further hints.
Tip! You
may also first query for known protein domains in a protein of
interest and then investigate known protein domain - domain
interactions. For this purpose, iPfam, a
sister database of Pfam, can be used. iPfam
is a
resource that describes domain-domain interactions that are observed in
PDB entries. The domains are defined by Pfam. When two or more domains
occur in a single structure, the domains are analysed to see if they
form an interaction. If the domains are close enough to form an
interaction, the bonds forming the interaction are calculated. More
information on how the bonds are calculated can be found in the help
section. The interaction information is re-calculated at each Pfam
release, so as Pfam changes, the information within iPfam is kept up
to date. You can access the information in iPfam from each domain family page, or
you can browse
by domain interaction. The browse page also allows a search by domain
name or accession.
Tip! If
you are mainly interested in human proteins, another very
interesting resource is HPRD
-
Human Protein Reference
Database. HPRD represents a centralized platform
to visually depict and integrate information pertaining to domain
architecture, post-translational modifications, interaction
networks
and disease association for each protein in the human proteome.
All the
information in HPRD has been manually extracted from the
literature by
expert biologists who read, interpret and analyze the published data.
HPRD is a joint project between Pandey Lab,
Johns Hopkins School of Medicine, Baltimore, and the IOB, Institute of
Bioinformatics, Bangalore, India. At first sight, HPRD may seem like
"yet another protein database", but there are some features which are
really worth mentioning. Concerning this specific question, each
protein entry has a tab called "Interactions" which displays
all known interaction partners of a specific
protein. The related PubMed abstracts are
linked with each interaction and the type (in vitro, in vivo) is
indicated. This is an example
of the interaction partners of the protein IKK alpha. NOTE:
These HPRD-based interaction data are also available
when performing a query at the STRING
database, and choosing the view "Experiments" at the output overview;
see also STRING section. Please
refer
to the HPRD section at the Pathways page for
a detailed description !
Tip! HiMAP
is a
dynamic browser for the human protein-protein interaction map,
provided by the University of Michigan. Because of this definition, the
main section of HiMAP is located
under "Pathways
and Interactions
Databases". BUT: HiMAP actually inludes a wide
range of
"Interaction types"
beyond real protein-protein interaction, like relation based on
co-expression (based on co-citations), enriched domain pairs
(based on
InterPro protein domains), or shared biological process (based
on GO terms).
A SINGLE gene search
is performed using gene symbol, gene name, Locus Link ID or Unigene ID.
In order to
draw protein interaction networks, the user may select between
different methods of protein-protein interaction determination /
prediction, like specific Yeast 2 Hybrid datasets, the HPRD dataset,
literature-confirmed interactions, or "pure" predictions. Note:
If more than one option is chosen, it is possible to highlight
interactions derived from a specific method by selecting the "Highlight
edges" checkbox. Note: It is possible to determine the node
colors in the graph based on molecular function or cellular
localization. The link "Legend"
in the graph window explains the different colors, like "kinase",
"transcription factor", "receptor" or "nucleus" and "cytoplasm". The
interaction network is drawn based on the selected
methods. Each "connecting line" (edge) between 2 genes is clickable in
order to display an information box displaying the 2 gene names, the
evidence type, and
associated PubMed references where this "interaction" is described.
Please
refer to the main section of HiMAP for details !
iHOP
-
information Hyperlinked Over Proteins generates a network built of
co-citations
of genes and proteins in public literature. iHOP is a public service
provided by the Protein Design Group (PDG), National Center of
Biotechnology (CNB), Madrid, Spain. By
employing genes and proteins as hyperlinks between sentences and
abstracts, iHOP converts the information in PubMed into one navigable
resource. Note that protein-protein interactions which
are experimentally verified are specifically highlighted. The
search for literature information about a particular gene (GENE X) is the starting point in
iHOP. You may limit the search to specific fields or to individual
organisms. NOTE: iHOP is also included and automatically
performed
in the "data super-integration" tool Bioinformatic Harvester
at EMBL ! Please refer
to the iHOP section at the Pathways page for
a detailed description !
PATH4...get all human transcription
factors involved in inflammation ? (last update Dec. 21, 2005)
This question is related to FAQ PATH2, which combines tissue specific expression
(endothelial cells) with biological process annotation (inflammation)
as well as to FAQ RET4 which combines
tissue specific expression with molecular function annotation
(transcription factor). In FAQ PATH4, we analyze the combined filtering
for molecular function (transcription factors), pathway information
(inflammation) together with species selection (human).
Tip! The
PANTHER (Protein ANalysis
THrough Evolutionary
Relationships) Classification
System was designed to classify proteins (and their genes) in
order to facilitate high-throughput analysis. Proteins have been
classified according to families and subfamilies, molecular functions,
biological processes, pathways. The high-throughput analysis tools
suitable for gene datasets are based on a PANTHER-specific gene
ontology as on pathway maps. There are several tools which
build the PANTHER portal. Please refer to the PANTHER main section for detailed
information !
The integrated Prowler tool
is a very efficient possibility to filter the PANTHER databases
according to specific criteria. Example:
Molecular Function: Transcription Factor + Pathway: Inflammation
mediated by cytokine signaling + Species: NCBI:H.sapiens. This
selection will retrieve a list of 17
genes which can be displayed and downloaded (see details under
"Batch ID Search" within the PANTHER
main section). NOTE that as PANTHER uses a "unique" system
of ontology terms, you may get different results when using tools based
on the GO (Gene Ontology) vocabularies.
PATH5...know which protein domains
are present / overrepresented in my gene set of interest ? -> see PROT7
!
This question is related to FAQ PATH1
as it involves similar resources. But as this FAQ is somehow also
the "batch version" of FAQ PROT1,
it is contained in the "Proteins" section.
It refers to larger datasets, like a cluster of genes from a microarray
experiment, where the user wants to extract the protein domains which
are involved. In addition, this FAQ lists programs which predict an overrepresentation
of protein domains as compared to a reference gene/protein set.
PATH6...get detailed information on
metabolic pathways, reactions, and compounds ? (last update Feb. 1, 2006)
In general, there are resources which deal both with
metabolic and regulatory (signal transduction) pathways,
like KEGG, and there are databases which focus on only one of
these fields. In this FAQ, resources are described and compared which
store information on metabolic pathways.
Tip! BioCyc is a collection of 205 Pathway/Genome
Databases plus the BioCyc
Open Chemical Database.
Each Pathway/Genome Database in the BioCyc collection describes the
genome and metabolic pathways (NO signaling pathways !)
of a single organism, with the exception
of the MetaCyc database, which is a reference
source on metabolic pathways from many organisms. The BioCyc
databases are divided into three tiers, based on their quality
(intensive-moderate-no curation). ALL databases (including
MetaCyc and
HumanCyc) can be queried from the BioCyc Query page. The user
may query
for pathways, reactions, compounds, genes, and proteins, or browse
ontologies or screen through lists of database entries.
Note that BioCyc contains many more comments
which describe individual
metabolic pathways than KEGG. KEGG maps are much larger than BioCyc
maps, and are mosaics that
combine reactions from many organisms, whereas BioCyc maps describe single
pathways elucidated in single organisms. Taken together, BioCyc
is at least of equal quality in the field of
metabolic pathways as compared to KEGG. Please refer also to the BioCyc main section for details.
KEGG
(Kyoto
Encyclopedia
of Genes and Genomes) is probably the best-known database for pathway
information. KEGG turns sequence information from a number of
organisms into metabolic
or regulatory
pathways. This site makes it easy to place genes into a functional
context, and to look for as yet unknown genes that might exist in
an organism. A good site to start from is the KEGG table of contents. All components
of the maps are clickable leading to detailed information.
Tip! PATHWAY-database
of KEGG is a graphical catalogue of metabolic
(like glycolysis or ATP synthesis) and regulatory
pathways (like apoptosis or cell cycle), which can be simply browsed by
topics. If available, different
organisms are compared. All components of the maps are clickable
leading to detailed information. Please refer
also to the KEGG section at the main page.
PATH7...download the gene lists of
individual pathways for further data analysis ? (last update Apr. 24, 2006)
Tip! CGAP - Cancer Genome
Anatomy Project is an NCBI resource which offers a
comprehensive molecular characterization of normal, precancerous, and
malignant cells. It contains genomic data for humans and mouse,
including transcript sequence, gene expression patterns, SNPs,
clone resources, and cytogenetic information. Informatics tools are
provided to
query and analyze the data.
One of the modules of CGAP is CGAP Pathways.
Pathways on the CGAP web site have been obtained directly from BioCarta (to create BioCarta
Pathways on CGAP) and KEGG
(to create KEGG
Pathways on CGAP). In addition, CGAP has linked each human gene
in BioCarta and each human enzyme in KEGG to its CGAP Gene Info page,
and each
intermediary metabolite in KEGG to a CGAP Compound Info page.
Example: The NF-kB
signaling pathway from Biocarta. NOTE: The "Genes" link
produces a Gene List of all genes seen in this pathway diagram,
including all the options for further data analysis as
described in the section "CGAP
Batch Gene Finder" !!! These options include "Common View"
which allows to create a table displaying all GO
terms, Pathways (KEGG and Biocarta), motifs, SNPs, and cyto locations
for the complete input gene set. Note that common aspects of the listed
genes are highlighted. Note that this is a very convenient and
quick way to create an annotation table for a gene set of
interest containing the most important function-related data, which is per
se independent of the expression in cancer situations ! This table
can also be saved as tab-delimited text. In addition, the expression
of the whole gene set can be
viewed as colored graph within the NCI60 panel of cancer cell
lines (please refer to the NCI60 section
for
background). The link "SAGE Summary" displays the SAGE counts
of the input
gene set in a series of normal and cancer tissues.
ID Mapping
is a tool for ID conversion provided by PIR,
a member of the UniProt consortium, which is capable of converting
lists of IDs (like GenBank AC, gi,
RefSeq, TIGR, Pfam, Prints, PROSITE, KEGG pathway ID, GO, BIND, Gene
name, Entrez Gene ID, OMIM, PubMed, and more) to and from UniProtKB ID
or AC. Note that this tool can therefore also be used for
questions like how to get all proteins belonging to a certain pathway. NOTE:
The conversion only works to and from UniProtKB ID
or AC, it is not possible to e.g. convert KEGG pathway IDs into
Entrez Gene IDs !