Combining neural estimators, NetGene
Soren Brunak
brunak at cbs.dtu.dk
Wed Jun 28 11:29:01 EDT 1995
Re: Combining neural network estimators, NetGene
For people interested in earlier work on combined neural networks,
I would like to bring the following paper to their attention:
Prediction of human mRNA donor and acceptor sites from the DNA
sequence, S. Brunak, J. Engelbrecht, and S. Knudsen, J. Mol. Biol.,
220, 49-65, 1991. (Abstract below).
The paper describes a method for locating intron splice sites in human
genes by combining a search for coding regions and a search for donor
and acceptor sites. The method was implemented as a mail server in
February 1992, and is still widely used. Since 1992 it has processed
more than 50 million nucleotides of DNA for researchers from many
different countries, mainly UK, USA and Germany. The mail server is
reached by sending mail to: NetGene at cbs.dtu.dk,
regards,
Soren Brunak
Center for Biological Sequence Analysis
The Technical University of Denmark
DK-2800 Lyngby, Denmark
Email: brunak at cbs.dtu.dk
-----------------------------------------------------------------------
Abstract:
Artificial neural networks have been applied to the prediction of
splice site location in human pre--mRNA. A joint prediction scheme
where prediction of transition regions between introns and exons
regulates a cutoff level for splice site assignment was able to predict
splice site locations with confidence levels far better than previously
reported in the literature. The problem of predicting donor and
acceptor sites in human genes is hampered by the presence of numerous
amounts of false positives --- in the paper the distribution of these
false splice sites is examined and linked to a possible scenario for
the splicing mechanism {\it in vivo}. When the presented method detects
95\% of the true donor and acceptor sites it makes less than 0.1\%
false donor site assignments and less than 0.4\% false acceptor site
assignments. For the large data set used in this study this means that
on the average there are one and a half false donor sites per true
donor site and six false acceptor sites per true acceptor site. With
the joint assignment method more than a fifth of the true donor sites
and around one fourth of the true acceptor sites could be detected
without accompaniment of any false positive predictions. Highly
confident splice sites could not be isolated with a widely used weight
matrix method or by separate splice site networks. A complementary
relation between the confidence levels of the coding/non--coding and
the separate splice site networks was observed, with many weak splice
sites having sharp transitions in the coding/non--coding signal and
many stronger splice sites having more ill--defined transitions between
coding and non--coding.
Prediction of human mRNA donor and acceptor sites from the DNA
sequence, S. Brunak, J. Engelbrecht, and S. Knudsen, J. Mol. Biol.,
220, 49-65, 1991.
More information about the Connectionists
mailing list