Bioinformatics World FAQ Center
 FAQ Index -> 3D STRUCTURES
                -> STRUC1...know if there are proteins with known 3D structures homologous to my query sequence and if I want to visualize these structures ? (last update May 17, 2006)
                -> STRUC2...distinguish ordered (globular) and disordered (unstructured) regions in my protein of interest ? (last update Nov. 17, 2003)
             
                          
Navigate   AtoZ   Search this Site   Site Journal    FAQ Index   Main Index   Appendix       
                        

STRUC1...know if there are proteins with known 3D structures homologous to my query sequence and if I want to visualize these structures ? (last update May 17, 2006)

    There are several ways to access and manipulate 3D protein structures. Some of the programs require a considerable time to get accustomed to their usage. The program which probably is most easy to handle is NCBI's Cn3D viewer. But we should start at the beginning.

1. Direct retrieval of protein-specific structures:
  
    Note that, instead of performing a "whole-genome" sequence similarity search (see 2.) to screen for homologs, you may of course directly query structure databases for availability of structures derived from the protein of interest itself.

1.1. NCBI-based resources:

    The NCBI Structure group maintains MMDB, a database of macromolecular 3D structures, as well as tools for their visualization and comparative analysis. MMDB, the Molecular Modeling Database, contains experimentally determined biopolymer structures obtained from the Protein Data Bank (PDB). MMDB is a subset of three-dimensional structures obtained from the PDB, excluding theoretical models.
    The MMDB structure database may be queried directly, using specific fields such as author names, or text terms occurring anywhere in the structure description. Entry points for queries are the Search Bar at the top of all Structure Group WWW pages or the WWW-Entrez interface to the 3-D structure database. Alternatively you can use a PDB 4-character code or a numerical MMDB-Id to retrieve structure summary pages directly. Example: 1j46 (PDB); 17220 (MMDB). Please refer to the main section of MMDB for detailed information !

1.2. EBI-based resources:

    MSD, which was started in 1996, is the EBI Macromolecular Structure Database - the European project for the collection, management and distribution of data about macromolecular structures, derived in part from the Protein Data Bank (PDB). MSD is one of the three member organizations that participate in Worldwide Protein Data Bank (wwPDB), a collaborative effort to provide a single, consistent PDB archive, which is publicly available and provides easy-to-use data retrieval and analysis. For this purpose, EBI performs "cleaning procedures" of the source database PDB, to ensure data uniformity across the whole archive. In general, MSD is extensively linked to other EBI databases like InterPro, GO, and Swiss-Prot, together with links to SCOP, CATH, Pfam, and PROSITE.      
    In order to search MSD, the MSDlite page is a good entry point. Several ID types are supported as input: PDB, PubMed, SCOP, CATH, UniProt, EC-number, Pfam, InterPro, GO. You may also limit by Experiment type, like X-ray, NMR, theoretical models, and more. It is also possible to perform Text Search, Keyword Search, and Sequence search. NOTE: The MSD Site Index lists many more Search Options for MSD data.

1.3. RCSB PDB:

    The "mother" PDB website is RCSB PDB. PDB - Protein Data Bank, is the single worldwide archive of structural data of biological macromolecules. Thus, the PDB is the most comprehensive place to look for structures of proteins, DNA, RNA, and polysaccharides. Since 2003, RCSB PDB is one of the three members of the Worldwide Protein Data Bank (wwPDB) consortium whose mission is to ensure that the PDB archive remains an international resource with uniform data.
    In order to Search the RCSB PDB, several  options are available, like Simple Search (simply enter a keyword, PDB ID or author name), Advanced Search (allows searches of all types - database fields, browsable ontologies, and text searches), Sequence Search (to search using a sequence), or Search Ligands (to search based on ligand or ligand substructure). Please refer to the RCSB PDB main section and the wwPDB main section for detailed information.

2. Retrieve sequence homologs which include structural data:

2.1. NCBI-based resources:

    If you have a protein query sequence, you can do a regular BLASTP search at NCBI, which automatically includes a so-called CD-search in NCBI's CDD-Conserved Domain Database, or you can directly go to CDD and paste your sequence there (or use a GenBank accession number). You will recieve a graphical output showing the position of individual domains within your sequence (like e.g. Ras family domain, or ABC transporter domain). These hits are taken from the Pfam and SMART databases of protein domains. Note that the E-value of a reliable hit should be less than 0.01. You can also see pairwise alignments with each of the hits. Note that if you run CD-search not from the main page but from the query page, you are able to specify parameters like E-value or search mode, which may significantly alter your output ! Entries which include 3D structures are specifically marked.

    Tip! Alternatively, you may start at NCBI Entrez Protein, fetch the protein entry of interest, and then hit the "BLink" link. BLink is not available as "program per se" but as link in each protein record stored in NCBI Entrez, see example report. BLink entries are based on pre-computed sequence alignments, generated from routine all-against-all BLAST comparisons performed at NCBI. The best 200 of these alignments can be displayed. BLink reports are highly customizable. Conserved protein domains are shown on top of the alignment, with links to the NCBI CDD database. The "3D structures" button limits the output to those sequences derived from structure records (linked via the colored dots in the "3D structures" display). Alternatively, you may select "Keep only: PDB" from the dropdown menu which filters for structure entries from the PDB database. The "CDD search" button links to a pre-computed conserved domain display for the query sequence, again linking to 3D structures. Note that BLink is also integrated in the "data super-integration tool" Bioinformatic Harvester of the EBI.

3. Visualization and analysis of structural data:

3.1. NCBI-based resources:

    In order to display the 3D structure now, you have to download and install the program Cn3D, which is available (for free) for PC, Mac, and UNIX systems. The executable is named "cn3d.exe". Back to your alignment. If you launch Cn3D via buttons like "View 3D Structure" for the first time, your browser will come up with a question, if you want to pick an application to view the selected file. You should enter the path where you saved the "cn3d.exe" file. For all future usages, your browser should automatically start Cn3D if you invoke a suitable file.
    In Cn3D, two windows are opened, a sequence window and the image window.

    In the Image window, you can rotate the molecule simply by dragging with your left mouse button. Using the commands "View", "Zoom in/out" you can redisplay the image. This also works with Ctrl+left mouse button+move mouse left/right. If you want to re-center the structure (move the object), which is necessary to zoom in on a particular part, use Shift+left mouse button. Using "View", "Animation", "Spin" it is possible to let the structure auto-rotate.
    You can easily differentiate between conserved and non-conserved residues in the alignment via the "Coloring" options in the "Style" menu. The "Style", "Edit Global Style" menu allows the selection of many different types of rendering and color schemes. The tab "Labels" provides options to label e.g. the termini of protein sequences and the individual amino acids and nucleotide residues, which is especially useful to identify interacting residues. 
  Please note that in case several molecules are part of one structure file, it is sometimes not easy to quickly identify each molecule in the structure. For this purpose, you may select individual molecules via the commands "Show/Hide", and "Pick Structures". If you want to know the identity of single accession numbers (like "1O4X_B"), you have to open the link to the MMDB database available at one of the previous pages in your browser (in this example: MMDB: 26017), or the corresponding PDB entry (in this example: PDB: 1O4X). In these entries, you can see the identity of the different molecules ("Chains") and you can correlate these IDs with the molecules you see in Cn3D (in this example: molecule 1O4X_B corresponds to transcription factor Sox-2).

    One of the most useful features of Cn3D is the excellent integration of sequence information and structure displays. This option makes it possible to highlight single residues or strecthes of sequence / structure (single mouse klick in the sequence window; double-klick in the image window) and immediately see the corresponding regions in the two windows. This allows a very quick localization of e.g. active centers of enzymes, or single residues in mutational studies.

    The Sequence window also offers several options. You can add as many sequence files as you want into the alignment. For this purpose, you first have to save your sequence as FASTA file, like in a word processor, save as *.txt file. In the Cn3D sequence window, you then choose "Imports", "Show Imports", which brings up the "Import Viewer", where you can choose "Edit", "Import Sequences", and select the txt-sequence file. The sequence is imported into the alignment and you can nicely investigate where stretches of conserved residues are located in the 3D structure. Both the sequence alignment and the 3D image can be exported in various file formats. The option "View", "Find Pattern" lets you search any kind of (ProSite type) pattern within the sequences.
    If you already know a 3D structure accession number (like MMDB Id: 3389 or PDB Id: 821P), you can directly open the corresponding file by entering the number (either 3389 or 821P) in the search field of the NCBI ENTREZ start page, like "Search" "Structure" "for" "821P".

3.2. EBI-based resources:

    MSD Entries have several data pages, and can be visualized using several structure viewers. AstexViewer and Jmol (available via the link "Jena image library) are Java programs which do not need any local installation. Rasmol, the "mother" of structure visualization tools, needs local installation.

3.3. RCSB PDB-based resources:

    RCSB PDB offers the widest range of structure visualization programs compared to the other major structure databases. These programs include KiNG Viewer, Jmol Viewer, WebMol Viewer, Protein Workshop, which are Java applets, which do not need local installations. Rasmol Viewer and Swiss-PDB Viewer need local installation.

In summary, all these structure viewers have their "pros" and "cons", based on different functionality and graphical design. Several test runs using series of structures lead to the conclusion that for the novice user, the NCBI Cn3D tool is possibly the best choice, both providing an excellent graphical display of the structures as well as a very user-friendly interface.
            
Main Index  FAQ Index 
                                


STRUC2...distinguish ordered (globular) and disordered (unstructured) regions in my protein of interest ? (last update Nov. 17, 2003) 

    Protein disorder can be described as the lack of regular secondary structure and a high degree of flexibility in the polypeptide chain. Disordered regions are of growing interest, also reflected by the increasing number of IUPs (intrinsically unstructured / disordered proteins), like Tau, Bcl-2, and Prions. IUPs contain unfolded regions in the native state. Ordered regions are often termed globular, and typically contain regular secondary structures packed into a compact globule. Avoiding potentially disordered segments in protein expression constructs can increase expression, foldability and stability of the expressed protein. Therefore, it can be highly useful to perform a prediction of disordered segments in protein sequences. The following servers may be used for this purpose.

    DisEMBL is easy to query, you either enter a valid SwissProt ID or AC, or a protein sequence. Predictions are shown according to each of the three definitions below. The predicted probabilities are shown as curves along the sequence and scores should always be compared to the corresponding random expectation value (dotted lines). Although no clear definition of disorder exists, the program uses 3 categories of disorder. Residues as alpha-helix or beta-strand are considered as ordered, and all other states as loops (also known as coils). Loops / coils are not necessarily disordered, however protein disorder is only found within loops. Hot Loops constitute a subset of the above, namely those loops with a high degree of mobility, i.e. coils with high temperature factors. Missing coordinates in X-Ray structure as defined by REMARK-465 entries in PDB. Non assigned electron densities most often reflect intrinsic disorder, and have been used early on in disorder prediction.
    Note that disordered regions in proteins often contain short linear peptide motifs (e.g. SH3-ligands and targeting signals) that are important for protein function. Linear peptide sites are catalogued by ELM, also refer to the ELM chapter and to FAQ PROT1.

    GlobPlot  is another computational tool that allows the user to plot the tendency within the query protein for order/globularity and disorder. GlobPlot is easy to query, you either enter a valid SwissProt ID or AC, or a protein sequence. Note that the two methods (DisEMBL and GlobPlot) complement each other as they offer different approaches/features.
                      
Main Index  FAQ Index