Combining neural estimators, NetGene

Soren Brunak brunak at cbs.dtu.dk
Wed Jun 28 11:29:01 EDT 1995


Re: Combining neural network estimators, NetGene


For people interested in earlier work on combined neural networks,
I would like to bring the following paper to their attention:

Prediction of  human  mRNA  donor  and  acceptor  sites  from  the  DNA
sequence, S. Brunak, J. Engelbrecht, and S.  Knudsen,  J.  Mol.  Biol.,
220, 49-65, 1991. (Abstract below).

The paper describes a method for locating intron splice sites in  human
genes by combining a search for coding regions and a search  for  donor
and acceptor sites. The method was implemented  as  a  mail  server  in
February 1992, and is still widely used. Since 1992  it  has  processed
more than 50 million nucleotides  of  DNA  for  researchers  from  many
different countries, mainly UK, USA and Germany.  The  mail  server  is
reached by sending mail to: NetGene at cbs.dtu.dk, 

regards,

Soren Brunak
Center for Biological Sequence Analysis
The Technical University of Denmark
DK-2800 Lyngby, Denmark 
Email: brunak at cbs.dtu.dk  

-----------------------------------------------------------------------
Abstract:

Artificial neural networks have  been  applied  to  the  prediction  of
splice site location in human  pre--mRNA.  A  joint  prediction  scheme
where prediction  of  transition  regions  between  introns  and  exons
regulates a cutoff level for splice site assignment was able to predict
splice site locations with confidence levels far better than previously
reported in  the  literature.  The  problem  of  predicting  donor  and
acceptor sites in human genes is hampered by the presence  of  numerous
amounts of false positives --- in the paper the distribution  of  these
false splice sites is examined and linked to a  possible  scenario  for
the splicing mechanism {\it in vivo}. When the presented method detects
95\% of the true donor and acceptor sites  it  makes  less  than  0.1\%
false donor site assignments and less than 0.4\%  false  acceptor  site
assignments. For the large data set used in this study this means  that
on the average there are one and a half  false  donor  sites  per  true
donor site and six false acceptor sites per true  acceptor  site.  With
the joint assignment method more than a fifth of the true  donor  sites
and around one fourth of the true  acceptor  sites  could  be  detected
without  accompaniment  of  any  false  positive  predictions.   Highly
confident splice sites could not be isolated with a widely used  weight
matrix method or by separate  splice  site  networks.  A  complementary
relation between the confidence levels of  the  coding/non--coding  and
the separate splice site networks was observed, with many  weak  splice
sites having sharp transitions in  the  coding/non--coding  signal  and
many stronger splice sites having more ill--defined transitions between
coding and non--coding. 

Prediction of  human  mRNA  donor  and  acceptor  sites  from  the  DNA
sequence, S. Brunak, J. Engelbrecht, and S.  Knudsen,  J.  Mol.  Biol.,
220, 49-65, 1991. 


More information about the Connectionists mailing list