Bioinformatics World FAQ Center
  FAQ Index -> RNA

                -> RNA1...detect regulatory elements in UTRs (UnTranslated Regions) in a whole-genome approach ? (last update Sep. 7, 2005)
                -> RNA2...get a structural prediction for the 3'-UTR sequence of my RNA of interest ? (last update Sep. 6, 2005)
                -> RNA3...get detailed information about a regulatory microRNA called miR-16 ? (last update Oct. 31, 2005)       
                -> RNA4...predict the potential targets of a microRNA like miR-16 ? (last update Nov. 2, 2005)    
                -> RNA5...predict if a specific mRNA of interest may be the target of microRNAs ? (last update Nov. 2, 2005)          
  
                     
Navigate   AtoZ   Search this Site   Site Journal    FAQ Index   Main Index   Appendix              
                        

           
RNA1...detect regulatory elements in UTRs (UnTranslated Regions) in a whole-genome approach ? (last update Sep. 7, 2005)              

    In general, the untranslated regions of genes (mRNAs) have not been investigated with the same enthusiasm compared to other regions like promoter sequences. Likewise, the number and the size of databases storing information on regulatory elements in UTR regions is still quite low, and there is no direct analysis tool allowing a batch-submission of thousands of sequences. Nevertheless, a specific strategy can be developed to address this question. I will try to describe the different approaches via the example of identifying so-called AU-rich elements (ARE) in 3'-UTRs of human genes, which are involved in accelerated mRNA decay, mediated by interaction with specific ARE-recognizing proteins.

    The UTResource is a collection of internet resources for sequence analysis of 5' and 3' untranslated regions of eukaryotic mRNAs. Note that a registration form has to be completed to use these programs. This site contains links to the following databases/programs. UTRdb is a specialized sequence collection, deprived from redundancy, of 5' and 3' UTR sequences from eukaryotic mRNAs. UTRSite is a collection of functional sequence patterns located in 5' or 3' UTR sequences. UTRScan looks for UTR functional elements by searching through user submitted query sequences for the patterns defined in the UTRsite collection. UTRScan does not allow a batch-submission of sequences, but is designed to analyze single sequences. This means that it is not the appropriate tool to screen whole genomes for certain elements. By the way, a second point is quite interesting. Although the mRNA for the well-characterized gene TNF-alpha is known to contain functional ARE sites in the 3'-UTR, UTRScan does not detect these sites. Obviously, the settings are quite stringent, producing a hit only in the case of a direct tandem repeat of the sequence ATTTA, not allowing for short "nucleotide spacers".

    REPFIND is a program to find clustered, exact repeats in nucleotide sequences. For each repeat cluster that it finds, it calculates a P-value, which indicates the probability of finding such a concentration of that particular repeat just by chance. Note that REPFIND is especially useful to detect regulatory signals in 3'-UTR sequences of mRNAs which often consist of repeat clusters, although it detects any kind of clustered repeats. REPFIND nicely extracts the ARE sites within the TNF-alpha 3'-UTR as best scoring hit, but only when the "Low Complexity Filter" is turned off. Therefore, you should carefully select / deselect this option when looking for motifs like AU-rich elements, which are similar to a "low complexity" sequence, and which therefore would be masked out (hidden) prior to the analysis. Like UTRScan, REPFIND only accepts single sequences as input (no batch submission), meaning that it is not suitable in a whole-genome approach. Please also refer to the REPFIND description at the main page.

    Tip! A suitable strategy for a whole-genome approach is already described in FAQ GEN4, part C) and it is also applicable for this question. Briefly, as first step you can use BioMart to extract the complete set of human 3'-UTR sequences. At the start page, choose the human genome, and select "Ensembl Genes". At the filter page, you may either deselect all boxes, meaning you will retrieve all genes, or you may limit the output to at least "a little characterized genes", by choosing "Genes with LocusLink IDs" or "with RefSeq IDs". At the output page, choose the "Sequences Page", where you can select for 3'-UTR regions only. Select "Text, fasta" as output format. After a few minutes, you will retrieve a (long) txt-file of FASTA sequences, which can be saved directly from your browser window and used for further analyses.
    The second step is performed using the RSAT program DNA-Pattern (Strings). When looking for ARE elements, we may define 3 patterns with decreasing specificity. ATTTAn{0,6}ATTTAn{0,6}ATTTA allows spacers of up to 6 nucleotides composed of any base. ATTTAw{0,6}ATTTAw{0,6}ATTTA allows spacers of up to 6 nucleotides composed of only A or T. ATTTAw{0,2}ATTTAw{0,2}ATTTA allows spacers of max. 2 nucleotides composed of A or T. Using "Browse" you select the "whole genome 3'-UTR - txt file" that you created in the first step for screening with your defined patterns. Note that when scanning large sequence sets, one might be interested in counting the number of matches, rather than returning their precise positions. This can be done by deselecting the checkbox match positions and selecting match counts instead, and by specifying a threshold. Threshold means that only those hits (promoters) are returned having more matches than the specified threshold (e.g. promoters showing at least 2 or 3 copies of a TF site). As expected, the 3 patterns produce a decreasing number of hits in the genome. Please note that in FAQ GEN4, part C) additional programs for Motif Matching are described in detail !
    Finally
, you can feed your RSAT list of potential target genes back into BioMart (at the "Filter" Page, "Limit to Genes with these IDs") in order to achieve an annotation table, which maintains all hyperlinks actively in an EXCEL sheet (at the "Output" Page, choose the "Features" you would like to see). Thereby, you can select target genes which are also relevant in the specific biological context.

    Tip! If the search for regulatory elements is limited to so-called AU-rich elements (ARE) in 3'-UTRs of human genes, a specific database may be used called ARED, the AU-Rich Element Database. ARED is maintained at the King Faisal Specialist Hospital & Research Centre (KFSH&RC) in Riyadh, Saudi Arabia. ARED contains GenBank entries where the 3'UTR matches the ARE motif, a 13-bp pattern WWWUAUUUAUWW (W=A/U), which was computationally derived from a list of functionally labile ARE-containing mRNAs. ARED demonstrates that ARE-mRNAs represent as much as 5-8% of human genes, but ARED contains computationally predicted ARE-mRNAs, there is no evidence how many of them are actually regulated by this mechanism. AREs are known to be recognized by specific proteins and / or small regulatory RNAs which dramatically influence the stability of the mRNA. Most of them are negative regulators (like ZFP36) which promote mRNA decay, but also positive regulators exist which stabilize the target mRNA (like HuR). Known examples of ARE-dependent regulation are the mRNAs of TNFalpha, PTGS2 (COX2), CSF2 (GMCSF), and IL3. Thus, several diseases like chronic inflammatory conditions exist which are known to be caused by stabilized ARE-mRNAs.
    There are already 3 different versions of the ARED database. While v1 and v2 only support single queries (gene names, IDs, mRNA acc., RefSeq, UniGene, etc.), v3 also supports batch queries using e.g. a list of gene names from a microarray experiment. NOTE: The list has to be pasted in column-format (like copied from Excel), not as space-delimited text !
ARED will produce a table which presents all genes with predicted AREs in their 3'UTRs stored in ARED database. Note that the actual sequences are NOT shown, only the "Class" and the "Cluster" of the respective AREs. Note: The result table may be saved as tab-delimited txt-file, which can easily be opened in Excel.
    Note:
When using the "Advanced search" option, you may also browse the (long) lists of ARE-mRNAs by selecting an ARE cluster and leaving all other fields empty.
ARE-mRNAs are clustered according to the length of the individual AREs: Cluster 1 mRNAs contain 5 continuous AREs, Cluster 2 contain 4, and Cluster 5 contain 1 ARE in a 13-bp ARE context. So, in order to answer the question of a "whole-genome" human approach, you may simply select "ALL" ARE Clusters, and download the table of 2476 (ARED v3) human mRNAs containing AREs.
    The authors of ARED state that the database is available as single GenBank flat file (i.e. nucleotide sequence with annotation) upon request.
         
Main Index  FAQ Index   


                    
RNA2...get a structural prediction for the 3'-UTR sequence of my RNA of interest ? (last update Sep. 6, 2005)         

    It is known that a growing list of genes is regulated by influencing the mRNA stability via so-called AU-rich elements (ARE) in 3'-UTRs (see also FAQ RNA1). The consensus ARE was computationally derived as the 13bp-pattern WWWUAUUUAUWW. It is also known that specific regulators which recognize AREs, like the positive regulator HuR, bind to the ARE only when the pattern is in a single-stranded conformation within the RNA secondary structure, meaning that it is located in one of the "loops"; see Meisner et al., 2004 for information. Now, it might be interesting to investigate other RNAs for the presence and structure of AREs within their 3'-UTRs. For this purpose, a resource which predicts secondary structure of RNAs is needed. NOTE: Structure prediction of RNAs in general is NOT a trivial task. In many cases, there will be a lot of "sub-optimal" structures which are only slightly less preferred than the "best" structure. Results have to be taken with caution.

    The Vienna RNA Package was developed for the prediction and comparison of RNA secondary structures at the Theoretical Biochemistry Group (TBI) of the University of Vienna, Austria. The package is free software and can be downloaded as C source code that should be easy to compile on almost any flavor of Unix and Linux. Note: This package developed for UNIX command-line use; there are no graphical user interfaces. Nevertheless, the Vienna RNA secondary structure server offers access to the most popular features of the Vienna RNA Package via easy to use web interfaces. RNAfold is the web interface to the RNAfold program. This server will predict secondary structures of single stranded RNA or DNA sequences. Thus, RNAfold provides both the most basic and most widely used function. The output presents the predicted mfe (minimum free energy) structure both as a string in bracket notation and links to the plots generated for visualization. Plots are produced in Postscript format. A suitable alternative is the new standard for Scalable Vector Graphics, SVG. For this purpose, the browser has to be equipped with a SVG plugin (typically from Adobe).

    Tip! A very easy-to-use and straightforward approach would be the following: The UCSC Genome Bioinformatics site provides pre-computed structures of 5'- and 3'-UTR regions of all RNAs, which were produced using RNAfold. The estimated folding energy is in kcal/mol. The more negative the energy, the more secondary structure the RNA is likely to have. As there are no stable URLs of individual gene entries in UCSC, you may follow this example: Open the UCSC Gene Sorter and search for the human gene PTGS2. You will retrieve a table where you can access the specific gene entry via the link "Description" (last column). There are several sections in this PTGS2-specific file, one of them is "mRNA Secondary Structure of 3' and 5' UTRs". There are several display formats for the predicted structures: "Picture" produces a PDF-file of the structure. You need to have a program installed capable of displaying PDF-files like Adobe Acrobat. "PostScript" produces a PS-format of the structure. You need to have a program installed capable of displaying PS-files like GSview. "Text" produces a "string in bracket notation"-format of the structure. Interestingly, you can see that the "core motif" AUUUA is indeed present in some of the predicted loop structures of this mRNA.
                     
Main Index  FAQ Index   


                    
RNA3...get detailed information about a regulatory microRNA called miR-16 ? (last update Oct. 31, 2005)         

    This is an example how to retrieve data concerning a non-protein coding microRNA which is known from the literature to be involved in regulatory processes like the destabilisation of the RNA of the inflammatory mediator COX2.

     Rfam is a joint project involving researchers based at the Wellcome Trust Sanger Institute, and Washington University, St. Louis (also providing a Rfam mirror site). Rfam is a large collection of multiple sequence alignments and covariance models covering many common non-coding RNA families. For each family in Rfam you can: View and download multiple sequence alignments, read family annotation, examine species distribution of family members, and follow links to other databases.
    In order to address this specific question, Rfam provides a simple keyword search allowing to query using any keyword, like "miR-16". You will retrieve not only the sequences from different species, but also the consensus secondary structure for family mir-16. In addition, you may produce multiple sequence alignments and view literature references.

    Tip! miRBase is the new home for microRNA data, incorporating the database and gene naming roles previously provided by the miRNA Registry, and including the new miRBase Target database. miRBase contains 3 main sections, one of them is miRBase Sequences which contains all published miRNA sequences, genomic locations and associated annotation. Each entry in the miRBase Sequence database represents a predicted hairpin portion of a miRNA transcript (termed mir in the database), with information on the location and sequence of the mature miRNA sequence (termed miR). Both hairpin and mature sequences are available for searching using BLAST and SSEARCH, and entries can also be retreived by name, keyword, references and annotation. All sequence and annotation data are also available for download.
    Note
that when searching for "miR-16" here, a longer list of molecules is presented, from various species, allowing a batch retrieval of either mature or precursor sequences, in multi-FASTA format or in ClustalW format. A single entry like the one for miR-16 also provides both the stem-loop sequence and the mature one as individual accession numbers in one common page.

    Tip! microRNAs are also displayed within the UCSC Genome Browser. You may try the example "miR-16" as input in the query field. You can see that miR-16-1 is located at chromosome 13, within an intron of a protein-coding gene and in close proximity to another microRNA calles miR-15a. NOTE that it is essential not to use the name miR16 (without "-") as this query will retrieve a totally different (protein-coding) RNA located on chromosome 16 !!!
                     
Main Index  FAQ Index   


                  
RNA4...predict the potential targets of a microRNA like miR-16 ? (last update Nov. 2, 2005)         

    Tip! miRBase contains all published miRNA sequences, genomic locations and associated annotation. In addition, miRBase provides links to databases which predict the potential targets of microRNAs. Thus, the easiest way to address this question is to look for the database entry of miR-16, and then jump to the referenced URLs of these target databases:

    miRBase Targets, a part of miRBase, is a web resource developed by the Enright Lab at the Wellcome Trust Sanger Institute containing computationally predicted targets for microRNAs across many species. The miRNA sequences are obtained from the miRBase Sequence database and most genomic sequence from EnsEMBL. This resource aims to provide the most up-to-date and accurate predictions of miRNA targets and hence this resource will be updated regularly to incorporate new miRNAs or EnsEMBL sequences.
    The predicted targets of miR-16 in miRBase comprise a list of more than 300 genes. Note that it is possible to rank this list according to different values, like best P-value from all target sites in a transcript (default), or total number of conserved target sites, or number of conserved species for which a target site is found, or number of different miRNAs predicted to hit this transcript. Each target gene entry provides a very nice viewer (in HTML or in Java) which displays the miRNAs along the potential target sequences and which shows a multiple sequence alignment allowing to estimate very quickly the evolutionary conservation of a specific target region. Note that miR-16 microRNA was shown to be involved in the mRNA-destabilization of inflammatory mediators like TNFalpha and PTGS2 (COX2), as shown in Jing et al., Cell 2005, but both of these targets are not listed in this miRBase Target entry.

    PicTar is an algorithm for the identification of microRNA targets. This searchable website provides details (3' UTR alignments with predicted sites, links to various public databases etc) regarding microRNA target predictions in vertebrates and microRNA target predictions across seven Drosophila species. PicTar can be used BOTH for predicting the targets of a certain microRNA OR for predicting the microRNAs which may target a specific mRNA of interest.
    The predicted targets for miR-16 comprise a list of over 700 human genes. A list of potential target genes is ranked by a specific PicTar score, with links to RefSeq and to the custom view of UCSC Genome Browser, displaying the PicTar miRNA prediction sites.

   
TargetScan is a portal at MIT storing several datasets of predictions of microRNA targets, either targeting only the 3'-UTRs or also targeting the ORF regions. TargetScan can be used BOTH for predicting the targets of a certain microRNA OR for predicting the microRNAs which may target a specific mRNA of interest. The user may choose a microRNA family (like "miR-15/16/195") in order to predict the targets of this family. The output shows a list of potential target genes ranked by an EFDR (estimated false discovery rate) score, with links to NCBI sequence database and to UCSC Genome Browser. Note that this list contains quite detailed summaries of the individual genes functions.
                     
Main Index  FAQ Index   


                
RNA5...predict if a specific mRNA of interest may be the target of microRNAs ? (last update Nov. 2, 2005)         
          
    Tip! miRBase Targets, a part of miRBase, is a web resource developed by the Enright Lab at the Wellcome Trust Sanger Institute containing computationally predicted targets for microRNAs across many species. The miRNA sequences are obtained from the miRBase Sequence database and most genomic sequence from EnsEMBL. This resource aims to provide the most up-to-date and accurate predictions of miRNA targets and hence this resource will be updated regularly to incorporate new miRNAs or EnsEMBL sequences.
    In order to look for a specific target gene of interest, like PTGS2 (COX2), simply enter this term at the "Search" page, within the field "Gene name". A list of all species is presented which contain information on the specific gene. All genes sre references via their Ensembl Transcript IDs, like ENST00000186982. Each target gene entry provides a very nice viewer (in HTML or in Java) which displays the miRNAs along the potential target sequences and which shows a multiple sequence alignment allowing to estimate very quickly the evolutionary conservation of a specific target region. Note that miR-16 microRNA was shown to be involved in the mRNA-destabilization of inflammatory mediators like PTGS2 (COX2), as shown in Jing et al., Cell 2005, but this miRNA is not shown along the PTGS2 mRNA, but a different miRNA (mmu-miR-350).

   
PicTar is an algorithm for the identification of microRNA targets. PicTar can be used BOTH for predicting the targets of a certain microRNA OR for predicting the microRNAs which may target a specific mRNA of interest. The user may enter a certain gene ID (like PTGS2) for which the potential matching microRNAs shall be predicted. The output presents a multiple species alignment of the cDNA of the chosen gene, highlighting the positions of individual predicted miRNA sites.

   
TargetScan is a portal at MIT storing several datasets of predictions of microRNA targets, either targeting only the 3'-UTRs or also targeting the ORF regions. TargetScan can be used BOTH for predicting the targets of a certain microRNA OR for predicting the microRNAs which may target a specific mRNA of interest. The user may enter a certain human EntrezGene ID (like 5743 for human PTGS2) for which the potential matching microRNAs shall be predicted. NOTE: In fact, a search for the gene name (PTGS2) was successful here but NOT using "5743"! The output is a tabular list of matching microRNA families to the mRNA of interest, with links to Rfam and to UCSC Genome Browser.
                                 
Main Index  FAQ Index