| DNA Linkpages | |
| Linkpage 1 - Tools for DNA
sequence analysis (Pasteur Institute) |
The Pasteur Institute
provides a highly recommendable start
site for DNA
sequence analysis, including numerous EMBOSS
tools, pattern search, restriction sites, transcription factors,
repeats, CpG islands, codon usage tables, primers, and much more. |
| DNA Properties | |
| GEECEE (EMBOSS, Pasteur) |
GeeCee
("GC")
calculates the fractional GC content of nucleic acids (EMBOSS
tool). Meaning: a score of 0.65 = 65 % GC content. |
| WORDCOUNT (EMBOSS, Pasteur) |
WORDCOUNT
counts words of a specified size in a DNA sequence (EMBOSS tool). Example:
Screen a sequence for all words of size 2. Note: If this
sequence is a CpG island, then the CG dinucleotide "words" will show a
top-position in the output. Note: The lower limit is "2", so this tool does not display the numbers of single nucleotide frequencies. In order to display single nucleotide frequencies, the program FUZZNUC may be used (please refer to the FUZZNUC main section). |
| Primers and In silico PCR | |
| CODEHOP (FHCRC) |
CODEHOP designs
PCR
primers from protein multiple-sequence alignments. The program
is
intended for cases where the protein sequences are distant from
each other and degenerate primers are needed. Please refer to the CODEHOP main description within the BLOCKS chapter. |
| Primer3 (Whitehead Institute) and EPRIMER3 (EMBOSS, Pasteur) |
1. Primer3,
provided by
the Whitehead Institute for Biomedical
Research, is a powerful
interface to pick PCR primers from a DNA sequence. There is a
multitude of options available, like GC-content, TM
values, product size, exclusion of parts of the query sequence... 2. EPRIMER3: Please note that there is an EMBOSS tool based on the Primer3 program, available at Pasteur institute. |
| PrimerBank (Harvard Medical School) |
PrimerBank is a
public resource for PCR primers, developed at Harvard Medical
School.
These primers are designed for gene expression detection or
quantification (real-time PCR). PrimerBank contains about 180,000
primers covering most known human and mouse genes. There are several ways to search for primers: GenBank Accession, NCBI protein accession, LocusLink ID, PrimerBank ID or Keyword (gene description). One common problem in PCR is the non-specific amplifications of other gene products because cDNAs libraries of thousand of genes are often used as PCR templates. Therefore, a careful design of PCR primers that specifically amplify the genes of interest is needed. Most available primer design programs only focus on primer chemical properties, such as melting temperature, GC content, secondary structure, etc. Little emphasis is given to primer mispriming to other genes. In contrast, all primers in PrimerBank were carefully designed to ensure gene specificity. Note: The current version of PrimerPair does not contain specific information if a primer pair spans an intron, but this feature will be included in future versions. If you need this kind of information, you may check a primer pair against the genomic sequence using programs like UCSC In silico PCR.
|
| UCSC
In silico PCR (UCSC) |
UCSC In silico PCR
searches
a sequence database with a pair of PCR primers. It uses an
indexing
strategy to do this quickly. Input: You can choose between different genome assemblies like human, mouse, or dog, and enter the sequences of the primer pair you would like to check. You can determine the maximum product size, and the minimum quality of the match. You may also automatically reverse complement the reverse primer. Output: When successful, the search returns a sequence output file in fasta format containing all sequence in the database that lie between and include the primer pair. The fasta header describes the region in the database and the primers. The fasta body is capitalized in areas where the primer sequence matches the database sequence and in lower-case elsewhere. In addition, a link displays the region graphically in the UCSC Genome Browser, where you can easily see if e.g. an intron lies between the 2 primers. NOTE: Programs which perform similarity searches of SINGLE oligos or peptides against genomes are described in section "Special Database Search" ! |
| Restriction Sites | |
| REBASE (NEB) |
REBASE
is a collection of information
about restriction enzymes and related proteins.
It contains published and unpublished references,
recognition and cleavage sites, isoschizomers, commercial availability,
methylation sensitivity, crystal and sequence data.
DNA methyltransferases, homing endonucleases, nicking enzymes,
specificity subunits and control proteins are also included.
Putative DNA methyltransferases and restriction enzymes, as predicted
from analysis of genomic sequences, are also listed.
REBASE is updated daily and is constantly expanding. |
| REMAP (EMBOSS, Pasteur) |
REMAP
is a highly
versatile and customizable EMBOSS tool, displaying restriction sites
along with translation in all six frames of nucleotide sequences. |
| RestrictionMapper (?) |
RestrictionMapper maps sites for restriction enzymes in DNA sequences. Also includes a "virtual digest". |
| TACG -
Restriction Enzyme analysis (EMBOSS, Pasteur) |
TACG performs restriction sites analysis for a given input sequence; numerous output possibilities, fragment lengths, enzyme selections, Pseudo-graphic gel map, integrated ORF analysis,... |
| Translation | |
| BACKTRANSEQ (EMBOSS, Pasteur) |
BACKTRANSEQ
translates
proteins back into nucleotide sequences with adapted codon usage.
BACKTRANSEQ takes a protein sequence and makes a best estimate of the
likely nucleic acid sequence it could have come from. It does this by
using a codon frequency table. For each amino acid, the corresponding
most frequently occuring codon is used in the construction of the
nucleic
acid sequence. BACKTRANSEQ reads in a data file containing the codon frequency tables. The default codon frequency table is 'Ehum.cut' - the human codon frequency table. It is important to use a codon frequency table that is appropriate for the species that your protein comes from. |
| Genetic
Code Viewer (EBI) |
The Genetic
Code Viewer, provided by the EBI, displays the codon usages in
different species, like vertebrate, yeast, bacteria, and also shows the
differences between "standard" and "mitochondrial" genomes. |
| ORF
Finder (NCBI) |
The ORF Finder (Open
Reading Frame Finder) is a graphical analysis tool which finds all
open reading frames of a selectable minimum size in a user's sequence
or in a sequence already in the database. This tool identifies all open reading frames using the standard or alternative genetic codes. The deduced amino acid sequence can be saved in various formats and searched against the sequence database using the WWW BLAST server. |
| SIXPACK (EMBOSS, Pasteur) |
SIXPACK
displays a DNA sequence with 6-frame
translation and ORFs and is a highly versatile tool ! Sixpack takes a nucleic acid sequence and writes out the forward and reverse senses of the sequence with the 3 forward and three reverse translations in a pretty display format. It also writes a file containing the open reading frames that are larger than the specified minimum size (default 1 base, showing all possible open reading frames). These open reading frames are written as protein sequences in one of the many output sequence formats (Fasta, GCG, Genbank, NCBI, Clustal, MSF, Phylip, Swiss...). NOTE that you have to enter a file name in the field "Output ORFs to a file" in order to get the protein sequences created as individual file. Also note that the program offers a long list of follow-up analyses on the ORFs created, using EMBOSS tools like Antigenic, Charge, Compseq, Helixturnhelix, Sigcleave, and many many more. |
| Translate Tool (ExPASy) |
Translate is
a tool which allows the translation of a nucleotide
(DNA/RNA) sequence to a protein sequence. In addition, the output sequence gets a virtual Swissprot accession number, allowing access to all other ExPASy tools (!). |
| TRANSEQ (EMBOSS, EBI) |
TRANSEQ translates nucleic acid sequences to the corresponding peptide sequence. It can translate in any of the 3 forward or three reverse sense frames, or in all three forward or reverse frames, or in all six frames. |