NetGene
BRUNAK@nbivax.nbi.dk
BRUNAK at nbivax.nbi.dk
Sun May 17 07:31:00 EDT 1992
******** Announcement of the NetGene Mail-server: *********
DESCRIPTION:
The NetGene mail server is a service producing neural network
predictions of splice sites in vertebrate genes as described in:
Brunak, S., Engelbrecht, J., and Knudsen, S. (1991) Prediction of
Human mRNA Donor and Acceptor Sites from the DNA Sequence. Journal
of Molecular Biology, 220, 49-65.
ABSTRACT OF JMB ARTICLE:
Artificial neural networks have been applied to the prediction of
splice site location in human pre-mRNA. A joint prediction scheme
where prediction of transition regions between introns and exons
regulates a cutoff level for splice site assignment was able to
predict splice site locations with confidence levels far better than
previously reported in the literature. The problem of predicting
donor and acceptor sites in human genes is hampered by the presence
of numerous amounts of false positives - in the paper the
distribution of these false splice sites is examined and linked to a
possible scenario for the splicing mechanism in vivo. When the
presented method detects 95% of the true donor and acceptor sites it
makes less than 0.1% false donor site assignments and less than 0.4%
false acceptor site assignments. For the large data set used in this
study this means that on the average there are one and a half false
donor sites per true donor site and six false acceptor sites per true
acceptor site. With the joint assignment method more than a fifth of
the true donor sites and around one fourth of the true acceptor sites
could be detected without accompaniment of any false positive
predictions. Highly confident splice sites could not be isolated
with a widely used weight matrix method or by separate splice site
networks. A complementary relation between the confidence levels of
the coding/non-coding and the separate splice site networks was
observed, with many weak splice sites having sharp transitions in the
coding/non-coding signal and many stronger splice sites having more
ill-defined transitions between coding and non-coding.
INSTRUCTIONS:
In order to use the NetGene mail-server:
1) Prepare a file with the sequence in a format similar to the fasta
format: the first line must start with the symbol '>', the next
word on that line is used as the sequence identifier. The
following lines should contain the actual sequence, consisting of
the symbols A, T, U, G, C and N. U is converted to T, letters not
mentioned are converted to N. All letters are converted to upper
case. Numbers, blanks and other nonletter symbols are skipped.
The lines should not be longer than 80 characters. The minimum
length analyzed is 451 nucleotides, and the maximum is 100000
nucleotides (your mail system may have a lower limit for the
maximum size of a message). Due to the non-local nature of the
algorithm sites closer than 225 nucleotides to the ends of the
sequence will not be assigned.
2) Mail the file to netgene at virus.fki.dth.dk. The response time will
depend on system load. If nothing else is running on the machine
the speed is about 1000 nucleotides/min. It may take several
hours before you get the answer, so please do not resubmit a job
if you get no answer within a short while.
REFERENCING AND FURTHER INFORMATION
Publication of output from NetGene must be referenced as follows:
Brunak, S., Engelbrecht, J., and Knudsen, S. (1991) Prediction of
Human mRNA Donor and Acceptor Sites from the DNA Sequence. Journal
of Molecular Biology, 220, 49-65.
CONFIDENTIALITY
Your submitted sequence will be deleted automatically immediately
after processing by NetGene.
PROBLEMS AND SUGGESTIONS:
Should be addressed to:
Jacob Engelbrecht
e-mail: engel at virus.fki.dth.dk
Department of Physical Chemistry
The Technical University of Denmark
Building 206
DK-2800 Lyngby
Denmark
phone: +45 4288 2222 ext. 2478 (operator)
phone: +45 4593 1222 ext. 2478 (tone)
fax: +45 4288 0977
EXAMPLE:
A file test.seq is prepared with an editor with the following contents:
>HUMOPS
GGATCCTGAGTACCTCTCCTCCCTGACCTCAGGCTTCCTCCTAGTGTCACCTTGGCCCCTCTTAGAAGC
CAATTAGGCCCTCAGTTTCTGCAGCGGGGATTAATATGATTATGAACACCCCCAATCTCCCAGATGCTG
. Here come more lines with sequence.
.
.
This is sent to the NetGene mail-server, on a Unix system like this:
mail netgene at virus.fki.dth.dk < test.seq
In return an answer similar to this is produced:
More information about the Connectionists
mailing list