
-> 3D STRUCTURES
-> STRUC1...know if there are proteins with known 3D structures
homologous to my query sequence and if I want to visualize these
structures ? (last update May 17,
2006)
-> STRUC2...distinguish ordered (globular) and disordered
(unstructured) regions in my protein of interest ? (last update Nov.
17, 2003)
STRUC1...know if there
are proteins with known 3D structures homologous to my query sequence
and if I want to visualize these structures ?
(last update May 17, 2006)
There are several ways to access and manipulate
3D
protein structures. Some of the programs require a considerable time to
get
accustomed to their usage. The program which probably is most easy to
handle
is NCBI's
Cn3D viewer. But we should start at the beginning.
1. Direct retrieval of protein-specific structures:
Note that, instead of performing a
"whole-genome" sequence similarity search (see 2.) to screen for
homologs, you may of course directly query structure databases for
availability of structures derived from the protein of interest itself.
1.1. NCBI-based resources:
The NCBI
Structure group
maintains MMDB,
a
database of macromolecular 3D structures, as well as tools
for their visualization and comparative analysis. MMDB, the Molecular
Modeling Database, contains experimentally determined biopolymer
structures obtained from the Protein Data Bank (PDB). MMDB is a subset
of three-dimensional structures obtained from the PDB, excluding
theoretical models.
The MMDB structure database may be queried
directly, using specific
fields such as author names, or text terms occurring anywhere in the
structure description. Entry points for queries are the Search Bar
at the top of all Structure Group WWW pages or the WWW-Entrez
interface to the 3-D structure database. Alternatively you can use a PDB
4-character code or a
numerical MMDB-Id to retrieve structure summary pages directly.
Example: 1j46 (PDB); 17220 (MMDB). Please refer to the main section of MMDB for detailed
information !
1.2. EBI-based resources:
MSD,
which was started in
1996, is the EBI Macromolecular Structure Database - the European
project for the collection, management and distribution of data about
macromolecular structures, derived in part from the Protein Data Bank (PDB).
MSD is one of the three member organizations that participate in Worldwide Protein Data Bank (wwPDB),
a collaborative effort to provide a single, consistent PDB archive,
which is publicly available and provides easy-to-use data retrieval and
analysis. For this purpose, EBI performs "cleaning procedures" of the
source database PDB, to ensure data uniformity across the whole
archive. In general, MSD is extensively linked to other EBI databases
like InterPro, GO, and Swiss-Prot, together with links to SCOP, CATH,
Pfam, and PROSITE.
In order to search MSD, the MSDlite page is a good
entry point. Several ID types are supported as input: PDB, PubMed,
SCOP, CATH,
UniProt, EC-number, Pfam, InterPro, GO. You may also limit by
Experiment type, like X-ray, NMR, theoretical models, and more. It is
also possible to perform Text Search, Keyword Search, and Sequence
search. NOTE: The MSD Site Index
lists many more Search Options for MSD data.
1.3. RCSB PDB:
The "mother" PDB website is RCSB PDB. PDB - Protein Data
Bank, is the single worldwide archive of structural data of
biological macromolecules. Thus, the PDB is the most comprehensive
place to look for structures of proteins, DNA, RNA, and
polysaccharides. Since 2003, RCSB PDB is one of the three
members of the Worldwide Protein Data
Bank (wwPDB) consortium whose mission is to ensure that the PDB
archive remains an international resource with uniform data.
In order to Search the RCSB PDB,
several options are available, like Simple
Search (simply enter a keyword, PDB ID or author name), Advanced
Search (allows searches of all types -
database fields, browsable ontologies, and text searches), Sequence Search
(to search using a sequence), or Search
Ligands (to search based on ligand or
ligand substructure). Please
refer to the RCSB PDB main section
and the wwPDB main section for
detailed information.
2. Retrieve sequence homologs which include structural data:
2.1. NCBI-based resources:
If you have a protein query sequence, you can do
a
regular BLASTP
search at NCBI,
which automatically includes a so-called CD-search in NCBI's
CDD-Conserved Domain Database, or you can directly go to CDD and
paste your sequence there (or use a GenBank accession number). You will
recieve a graphical output showing the position of individual domains
within your sequence (like e.g. Ras family domain, or ABC transporter
domain). These hits are taken from the Pfam and SMART databases of
protein domains. Note that the E-value of a reliable hit should be less
than 0.01. You can also see pairwise alignments with each
of the hits. Note that if you run CD-search
not from the main page but from the query page,
you are able to specify
parameters like E-value or search mode, which may significantly
alter your output ! Entries which include 3D
structures are specifically marked.
Tip!
Alternatively, you may start at NCBI
Entrez
Protein, fetch the protein entry of interest, and then hit the "BLink"
link. BLink is not available as "program per se" but as link
in each protein record stored in NCBI Entrez, see example
report. BLink entries are based on pre-computed sequence
alignments, generated from routine all-against-all BLAST comparisons
performed at NCBI. The best 200 of these alignments can be displayed.
BLink reports are highly customizable. Conserved protein domains
are shown on top of the
alignment, with links to the NCBI CDD
database.
The "3D structures" button limits the output to those sequences
derived from structure records (linked via the colored dots in the "3D
structures" display). Alternatively, you may select "Keep only: PDB"
from the dropdown menu which filters for structure entries from the PDB
database. The "CDD search" button links to a
pre-computed conserved
domain display for the query sequence, again linking to 3D structures. Note
that BLink is also integrated in the "data super-integration tool" Bioinformatic Harvester of the
EBI.
3. Visualization and analysis of structural data:
3.1. NCBI-based resources:
In order to display the 3D structure now,
you have to download and install the program Cn3D,
which is available (for free) for PC, Mac, and UNIX systems. The
executable is named "cn3d.exe".
Back to your alignment. If you launch Cn3D via buttons
like "View 3D Structure" for the first time, your browser will come up
with a question, if you want to pick an application to view the
selected file. You should enter the path where
you saved the "cn3d.exe" file. For all future usages, your browser
should automatically start Cn3D if you invoke a suitable file.
In Cn3D, two windows are opened, a sequence
window and the image window.
In the Image window, you can rotate
the
molecule
simply by dragging with your left mouse button. Using the
commands "View", "Zoom in/out" you can redisplay the
image. This also works with Ctrl+left mouse button+move mouse
left/right. If you want to re-center the structure (move the
object), which is necessary to zoom in on a particular part, use
Shift+left mouse button. Using "View", "Animation", "Spin"
it is possible to let the structure auto-rotate.
You can easily
differentiate between conserved and non-conserved residues in
the
alignment via the "Coloring" options in the "Style"
menu. The "Style",
"Edit Global Style" menu allows the selection of many different
types
of rendering and color schemes. The tab "Labels" provides
options to
label e.g. the termini of protein sequences and the individual amino
acids and nucleotide residues, which is especially useful to identify
interacting residues.
Please note
that in case several molecules are part of one structure file,
it is sometimes not easy to quickly identify each molecule in the
structure. For this purpose, you may select individual molecules
via
the commands "Show/Hide", and "Pick Structures". If you
want to know
the identity of single accession numbers (like "1O4X_B"), you have to
open the link to the MMDB
database available at one of the previous pages in your browser (in
this example: MMDB: 26017),
or the corresponding PDB entry
(in this example: PDB: 1O4X).
In these entries, you can see the identity of the different molecules
("Chains") and you can correlate these IDs with the molecules you see
in Cn3D (in this example: molecule 1O4X_B corresponds to transcription
factor Sox-2).
One of the most useful features of Cn3D is the
excellent integration of sequence information and structure displays.
This option makes it possible to highlight single residues or
strecthes of sequence / structure (single mouse klick in
the sequence window; double-klick in the image window) and immediately
see the corresponding regions in the two windows. This allows a very
quick localization of e.g. active centers of enzymes, or single
residues in mutational studies.
The Sequence window also offers several
options. You can
add as many sequence files as you want into the alignment. For
this
purpose, you first have to save your sequence as FASTA file, like in a
word processor, save as *.txt file. In the Cn3D sequence window, you
then choose "Imports", "Show Imports", which brings up the "Import
Viewer",
where you can choose "Edit", "Import Sequences", and select the
txt-sequence file. The sequence is imported into the alignment and you
can nicely investigate where stretches of conserved residues are
located in the 3D structure. Both the sequence alignment and the 3D
image can be exported in various file formats. The option "View", "Find
Pattern" lets you search any kind of (ProSite type) pattern within
the sequences.
If you already know a 3D structure accession
number (like MMDB
Id: 3389 or PDB Id: 821P), you
can directly open the corresponding file by entering the number (either
3389 or 821P) in the search field of the NCBI ENTREZ start
page, like "Search" "Structure" "for" "821P".
3.2. EBI-based resources:
MSD
Entries have several data pages, and can be visualized using
several structure viewers. AstexViewer and Jmol
(available via the link "Jena image library) are Java programs which do
not need any local installation. Rasmol, the "mother" of
structure visualization tools, needs local installation.
3.3. RCSB PDB-based resources:
RCSB PDB offers the widest range of structure
visualization programs compared to the other major structure
databases. These programs include KiNG Viewer, Jmol Viewer, WebMol
Viewer, Protein Workshop, which are Java
applets, which do not need local installations. Rasmol Viewer and
Swiss-PDB Viewer need local installation.
In summary, all these structure viewers have their "pros"
and "cons", based on different functionality and graphical design.
Several test runs using series of structures lead to the conclusion
that for the novice user, the NCBI Cn3D tool is possibly the
best choice, both providing an excellent graphical display of the
structures as well as a very user-friendly interface.
STRUC2...distinguish ordered (globular) and disordered
(unstructured) regions in my protein of interest ? (last update Nov.
17, 2003)
Protein disorder can be described as the
lack of regular secondary structure and a high degree of flexibility in
the polypeptide chain. Disordered regions are of growing interest, also
reflected by the increasing number of IUPs (intrinsically
unstructured / disordered proteins), like Tau, Bcl-2, and Prions. IUPs
contain unfolded regions
in the native state. Ordered regions are often termed globular, and
typically contain regular secondary structures packed into a compact
globule. Avoiding potentially disordered segments in protein expression
constructs can
increase expression, foldability and stability of the expressed
protein.
Therefore, it can be highly useful to perform a prediction of
disordered
segments in protein sequences. The following servers may be used for
this
purpose.
DisEMBL is
easy to query, you either enter a valid SwissProt ID or AC, or a
protein sequence. Predictions are shown according to each of the three
definitions below. The predicted probabilities are shown as curves
along the sequence and scores should always be compared to the
corresponding random expectation value (dotted lines). Although no
clear definition of
disorder exists, the program uses 3 categories of disorder. Residues
as alpha-helix or beta-strand are considered as ordered, and all other
states as loops (also known as coils). Loops / coils
are not necessarily disordered, however protein disorder is only found
within loops. Hot Loops constitute a subset of the above, namely
those loops with a high degree of mobility, i.e. coils with high
temperature
factors. Missing coordinates in X-Ray structure as
defined
by REMARK-465 entries in PDB. Non assigned electron densities most
often
reflect intrinsic disorder, and have been used early on in disorder
prediction.
Note that disordered regions in
proteins often contain short linear peptide motifs (e.g.
SH3-ligands and targeting signals) that are important for protein
function. Linear peptide sites are catalogued by ELM, also refer to the ELM chapter and to FAQ PROT1.
GlobPlot
is another computational tool that allows the user to plot the tendency
within the query protein for order/globularity and disorder.
GlobPlot is easy to query, you either enter a valid SwissProt
ID or AC, or a protein sequence. Note that the two methods (DisEMBL and
GlobPlot) complement each other as they offer different
approaches/features.