HUMOPS: NetGene splice site prediction

virus mail server netgene at virus.fki.dth.dk
Fri Mar 20 07:30:41 EST 1992


------------------------------------------------------------------------
                               NetGene
               Neural Network Prediction of Splice Sites
                                                         
Reference: 
Brunak, S.,  Engelbrecht,  J., and  Knudsen, S.  (1991).  Prediction  of
Human mRNA donor and acceptor  sites from the DNA  sequence.  Journal of
Molecular Biology 220:49-65.
------------------------------------------------------------------------

Report ERRORS to Jacob Engelbrecht engel at virus.fki.dth.dk.

Potential splice sites are assigned by combining output from a local and
a global  network.  The  prediction is made with two cutoffs:  1) Highly
confident  sites (no or few false  positives, on average 50% of the true
sites  detected);  2) Nearly all true sites (more  false  positives - on
average of all positions 0.1% false positive  donor sites and 0.4% false
positive  acceptor  sites, at 95% detection of true sites).  The network
performance on sequences from distantly  related  organisms has not been
quantified.  Due to the  non-local  nature of the algorithm sites closer
than 225 nucleotides to the ends of the sequence cannot be assigned.



Column explanations, field identifiers: 

POSITION in your sequence (either first or last base in intron).
Joint CONFIDENCE level for the site (relative to the cutoff). 
EXON INTRON gives 20 bases of sequence around the predicted site.
LOCAL is the site confidence from the local network. 
GLOBAL is the site confidence from the global network. 

------------------------------------------------------------------------
The sequence: HUMOPS contains 6953 bases, and has the following composition:
A 1524 C 2022 G 1796 T 1611


1) HIGHLY CONFIDENT SITES:
==========================

ACCEPTOR SITES:
POSITION     CONFIDENCE        INTRON EXON         LOCAL   GLOBAL
    4094           0.27    TGTCCTGCAG^GCCGCTGCCC    0.63     0.66
    5167           0.20    TGCCTTCCAG^TTCCGGAACT    0.59     0.64
    3812           0.17    CTGTCCTCAG^GTACATCCCC    0.68     0.54
    3164           0.02    TCCTCCTCAG^TCTTGCTAGG    0.79     0.32
    2438           0.01    TGCCTTGCAG^GTGAAATTGC    0.78     0.33

DONOR SITES:
POSITION     CONFIDENCE          EXON INTRON       LOCAL   GLOBAL
    3979           0.38    CGTCAAGGAG^GTACGGGCCG    0.92     0.74
    2608           0.17    GCTGGTCCAG^GTAATGGCAC    0.85     0.54
    4335           0.06    GAACAAGCAG^GTGCCTACTG    0.83     0.41


2) NEARLY ALL TRUE SITES:
=========================

ACCEPTOR SITES:
POSITION     CONFIDENCE        INTRON EXON         LOCAL   GLOBAL
    4094           0.55    TGTCCTGCAG^GCCGCTGCCC    0.63     0.66
    3812           0.52    CTGTCCTCAG^GTACATCCCC    0.68     0.54
    3164           0.49    TCCTCCTCAG^TCTTGCTAGG    0.79     0.32
    5167           0.49    TGCCTTCCAG^TTCCGGAACT    0.59     0.64
    2438           0.48    TGCCTTGCAG^GTGAAATTGC    0.78     0.33
    4858           0.39    TCATCCATAG^AAAGGTAGAA    0.77     0.20
    3712           0.36    CCTTTTCCAG^GGAGGGAATG    0.88    -0.01
    4563           0.33    CCCTCCACAG^GTGGCTCAGA    0.81     0.05
    5421           0.33    TTTTTTTAAG^AAATAATTAA    0.75     0.13
    3783           0.29    TCCCTCACAG^GCAGGGTCTC    0.64     0.26
    3173           0.25    GTCTTGCTAG^GGTCCATTTC    0.52     0.36
    4058           0.24    CTCCCTGGAG^GAGCCATGGT    0.43     0.51
    1784           0.22    TCACTGTTAG^GAATGTCCCA    0.68     0.08
    6512           0.21    CCCTTGCCAG^ACAAGCCCAT    0.67     0.08
    2376           0.20    CCCTGTCTAG^GGGGGAGTGC    0.61     0.16
    1225           0.18    CCCCTCTCAG^CCCCTGTCCT    0.65     0.07
    1743           0.13    TTCTCTGCAG^GGTCAGTCCC    0.62     0.03
    3834           0.13    GGGCCTGCAG^TGCTCGTGTG    0.26     0.58
    4109           0.13    TGCCCAGCAG^CAGGAGTCAG    0.29     0.54
    6557           0.13    CATTCTGGAG^AATCTGCTCC    0.56     0.12
    1638           0.11    CCATTCTCAG^GGAATCTCTG    0.62     0.00
     247           0.10    GCCTTCGCAG^CATTCTTGGG    0.55     0.11
    6766           0.09    CTATCCACAG^GATAGATTGA    0.64    -0.06
     906           0.08    AATTTCACAG^CAAGAAAACT    0.61    -0.02
    6499           0.08    CAGTTTCCAG^TTTCCCTTGC    0.55     0.06
     378           0.07    GTACCCACAG^TACTACCTGG    0.24     0.52
    3130           0.07    CTGTCTCCAG^AAAATTCCCA    0.51     0.12
    4272           0.07    ACCATCCCAG^CGTTCTTTGC    0.58     0.00
    4522           0.07    TGAATCTCAG^GGTGGGCCCA    0.51     0.12
    5722           0.07    ACCCTCGCAG^CAGCAGCAAC    0.55     0.05
    2316           0.06    CTTCCCCAAG^GCCTCCTCAA    0.40     0.27
    2357           0.06    GCCTTCCTAG^CTACCCTCTC    0.39     0.28
    2908           0.06    TTTGGTCTAG^TACCCCGGGG    0.51     0.10
    4112           0.06    CCAGCAGCAG^GAGTCAGCCA    0.25     0.50
    1327           0.05    TTTGCTTTAG^AATAATGTCT    0.52     0.06
     844           0.04    GTTTGTGCAG^GGCTGGCACT    0.62    -0.11
    1045           0.04    TCCCTTGGAG^CAGCTGTGCT    0.54     0.01
    1238           0.03    CTGTCCTCAG^GTGCCCCTCC    0.50     0.06
    2976           0.03    CCTAGTGCAG^GTGGCCATAT    0.62    -0.12
    3825           0.03    CATCCCCGAG^GGCCTGCAGT    0.16     0.60
    1508           0.02    TGAGATGCAG^GAGGAGACGC    0.43     0.16
    2257           0.02    CTCTCCTCAG^CGTGTGGTCC    0.53     0.00
    5712           0.02    ATCCTCTCAG^ACCCTCGCAG    0.51     0.05
    2397           0.00    CCCTCCTTAG^GCAGTGGGGT    0.41     0.16
    4800           0.00    CATTTTCTAG^CTGTATGGCC    0.47     0.07
    5016           0.00    TGCCTAGCAG^GTTCCCACCA    0.59    -0.11

DONOR SITES:
POSITION     CONFIDENCE          EXON INTRON       LOCAL   GLOBAL
    3979           0.75    CGTCAAGGAG^GTACGGGCCG    0.92     0.74
    2608           0.51    GCTGGTCCAG^GTAATGGCAC    0.85     0.54
    4335           0.38    GAACAAGCAG^GTGCCTACTG    0.83     0.41
     656           0.32    ACCCTGGGCG^GTATGAGCCG    0.56     0.66
    5859           0.11    ACCAAAAGAG^GTGTGTGTGT    0.85     0.07
    4585           0.09    GCTCACTCAG^GTGGGAGAAG    0.86     0.03
    1708           0.06    TGGCCAGAAG^GTGGGTGTGC    0.85     0.01
    6196           0.05    CCCAATGAGG^GTGAGATTGG    0.86    -0.01
     667           0.03    TATGAGCCGG^GTGTGGGTGG    0.23     0.71

------------------------------------------------------------------------



More information about the Connectionists mailing list