Bioinformatics World    
         
 Main Index -> CHEMINFORMATICS
                -> Cheminformatics Linkpages       
                -> Small Molecules Databases      
                     
               
Navigate    AtoZ   Search this Site   Site Journal    FAQ Index   Main Index   Appendix       
                          
Cheminformatics Linkpages
NOTE: There are several defintions for the term "Cheminformatics", like "cheminformatics is the use of computer techniques applied to a range of problems in the field of chemistry. Also known as chemoinformatics and chemical informatics, these techniques are used in pharmaceutical companies in the process of Drug Discovery". "In fact, Cheminformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information". 
Cheminformatics.org
(Cambridge University, UK)
Cheminformatics.org is a non-commercial web site which compiles information on cheminformatics web resources, available at the Unilever Centre for Molecular Science Informatics, University of Cambridge. This website contains links to cheminformatics programs and QSAR datasets (with structures!).   
                              
                                      
Small Molecules Databases
NOTE: This section comprises databases which store information on "small molecules" like drugs. In general, only molecules not directly encoded by the genome are included, and thus nucleic acids, proteins and peptides derived from proteins by cleavage usually are not found in these databases.
ChEBI
(EBI)


ChEBI, Chemical Entities of Biological Interest, is a freely available dictionary of "small molecular entities". The term ‘molecular entity’ encompasses any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity. The molecular entities in question are either products of nature or synthetic products used to intervene in the processes of living organisms. Molecules directly encoded by the genome (e.g. nucleic acids, proteins and peptides derived from proteins by cleavage) are not as a rule included in ChEBI. In order to create ChEBI, data from a number of different sources were incorporated and then merged, like IntEnz – the Integrated relational Enzyme database of the EBI, and KEGG LIGAND – Part of the Kyoto Encyclopedia of Genes and Genomes.

Query:
The Advanced Search page of ChEBI allows to search in all or single categories like ChEBI ID, ChEBI name, synonym, IUPAC name and more.

Test:
- All 3 test gene names (PTGS2, TP53, SELE) are not found, as there is obviously no gene-drug correlation in ChEBI.
- From the 3 test drug names (Aspirin, Diclofenac, Celebrex) only one (Aspirin) is found.
- All 3 test disease names (Atherosclerosis, Alzheimer, Inflammation) are not found, as there is obviously no disease-drug correlation in ChEBI.

In contrast, one of the main features of ChEBI is to generate a Chemical Ontology. Chemical Ontology is a structured classification of chemical compounds of biological relevance. Its structure is that of a directed acyclic graph (DAG), which differs from a simple taxonomy in that a child term can have many parent terms. Data sources include the chemical names as currently used in the Gene Ontology and external sources such as BioCyc, KEGG LIGAND, ENZYME. All terms for compounds are classified according to either chemical nature ('grouped_by_chemistry') or biological function ('grouped_by_functions'), or are yet to be classified ('unclassifieds'). A classified compound may belong to more than one structural and more than one functional class. Each entry in Chemical Ontology has been assigned a ChEBI identifier. However, as only entries which have been checked by a ChEBI curator are released not all entries are visible.

Typical ChEBI accessions: refer to section ChEBI IDs.
KEGG LIGAND
(KEGG)

The KEGG LIGAND database is part of the KEGG portal which turns sequence information from a number of organisms into metabolic or regulatory pathways. The KEGG LIGAND database processes requests related to drug names very efficiently, whereas KEGG GENES is used to query by gene names. Tests using disease names were not successful.
Please refer to the main section of KEGG LIGAND for details !
MMDB
(NCBI)
MMDB is the NCBI structure database of macromolecular 3D structures, as well as tools for their visualization and comparative analysis. If you want to search the NCBI structure database using keywords, protein or nucleotide sequence, go to Entrez Structure.
MMDB can also be used very efficiently if you want to retrieve structures of drugs in complex with their targets.
Please refer to the main section of MMDB for details !
PharmGKB
(Stanford)

including:

PharmGKB Pathways

PharmGKB - the "Pharmacogenetics and Pharmacogenomics Knowledgebase", is a research tool developed by Stanford University with funding from the National Institutes of Health (NIH). Its aim is to aid researchers in understanding how genetic variation among individuals contributes to differences in reactions to drugs. The PharmGKB database is a central repository for genetic and clinical information about people who have participated in research studies at various medical centers in the PGRN. In addition, genomic data, molecular and cellular phenotype data, and clinical phenotype data are accepted from the scientific community at large.

This means that the scope of PharmGKB is not only to store information on "small molecule entities" but to cross-reference these data also with genes and diseases. Thus, you can search PharmGKB using either a gene name, a drug name, or a disease name. You may also browse all genes, drugs, diseases and even pathways which are listed alphabetically. Taken together, the interface is very user-friendly.

1. Gene entries, Drug entries, and Disease entries are marked with symbols indicating if genotype and phenotype data, as well as literature references are available.
Test:
- All 3 test gene names (PTGS2, TP53, SELE) are found.
- All 3 test drug names (Aspirin, Diclofenac, Celebrex) are found.
- All 3 test disease names (Atherosclerosis, Alzheimer, Inflammation) are found.

2. PharmGKB Pathways are drug centered, gene based, interactive pathways which aim to highlight candidate genes and gene groups and associated genotype and phenotype data of relevance for pharmacogenetic and pharmacogenomic studies. Examples are: Glucocorticoid pathway, VEGF pathway, or Statin pathway. All components of pathway images are clickable, thereby linking to corresponding genes. Also, related pathways, drugs, and diseases are listed. Please note that, although graphically appealing, the total number of pathways listed is (yet) quite limited. Also note that there is no option to scan this pathway database using a whole set of genes.

Typical PharmGKB accessions: refer to section PharmGKB IDs.
PubChem
(NCBI)

including:

PubChem Substance

PubChem Compound

PubChem BioAssay
PubChem is a collection of databases provided by the NCBI, which contains the chemical structures of small organic molecules and information on their biological activities. PubChem is organized as three linked databases within the Entrez/PubMed information retrieval system, meaning that all 3 databases can be searched "in one run". These are PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem also provides PubChem Structure Search, a fast chemical structure similarity search tool that links to the PubChem Compound and PubChem Substance databases.
PubChem is an excellent resource if you want to know more about substances like "aspirin", "gleevec", "simvastatin", or "celebrex".

1. PubChem Substance contains descriptions of chemical samples, from a variety of sources, and links to PubMed citations, protein 3D structures, and biological screening results that are available in PubChem BioAssay. If the contents of a chemical sample are known, the description includes links to PubChem Compound. PubChem Substances have a "SID" identifier.

2. PubChem Compound contains validated chemical depiction information provided to describe substances in PubChem Substance. Structures stored within PubChem Compounds are pre-clustered and cross-referenced by identity and similarity groups. Additionally, calculated properties and descriptors are available for searching and filtering of chemical structures. PubChem Compounds have a "CID" identifier.

3. PubChem BioAssay contains bioactivity screens of chemical substances described in PubChem Substance. It provides searchable descriptions of each bioassay, including descriptions of the conditions and readouts specific to that screening procedure. PubChem BioAssay entries have an "AID" identifier. 

Test:
- All 3 test gene names (PTGS2, TP53, SELE) are not found, as there is obviously no gene-drug correlation in PubChem.
- All 3 test drug names (Aspirin, Diclofenac, Celebrex) are found.
- All 3 test disease names (Atherosclerosis, Alzheimer, Inflammation) are found. Note: A drug-disease correlation is only found in cases where the disease name is mentioned in the section "Medical Subject Annotations" of a PubChem entry. This produces only a few hits with "atherosclerosis" and "alzheimer" but a long list of hits with "inflammation".

Typical PubChem accessions: refer to section PubChem IDs.