HMMs and Molecular Biology
Yves Chauvin
yves at netid.com
Fri Jan 28 14:43:02 EST 1994
**DO NOT FORWARD TO OTHER GROUPS**
The following papers,
"Hidden Markov Models of Biological Primary Sequence Information",
to be published in the Proceedings of the National Academy of
Sciences (USA), vol. 91, February 94.
and
"Hidden Markov Models for Human Genes",
to be published in the Proceedings of the 1993 NIPS conference, vol.
6.
have been placed on ftp site.
Further information and retrieval instructions are given below.
Yves Chauvin
yves at netid.com
___________________________________________________________________________
Hidden Markov Models of
Biological Primary Sequence Information
Pierre Baldi
Jet Propulsion Laboratory
and Division of Biology,
California Institute of Technology
Pasadena, CA 91109
Yves Chauvin
Net-ID, Inc.
Tim Hunkapiller
University of Washington
Marcella A. McClure
University of Nevada
Hidden Markov Model (HMM) techniques are used to model families of
biological sequences. A smooth and convergent algorithm is introduced
to iteratively adapt the transition and emission parameters of the models
from the examples in a given family. The HMM approach is applied
to three protein families: globins, immunoglobulins and kinases.
In all cases, the models derived capture the important statistical
characteristic of the family and can be used for a number of tasks
including: multiple alignments, motif detection and classification.
For $K$ sequences of average length $N$, this approach yields an
effective multiple alignment algorithm which requires $O(KN^2)$
operations, linear in the number of sequences.
___________________________________________________________________________
Hidden Markov Models for Human Genes
Pierre Baldi
Jet Propulsion Laboratory
and Division of Biology,
California Institute of Technology
Pasadena, CA 91109
Soren Brunak
The Technical University of Denmark
Yves Chauvin
Net-ID, Inc.
Jacob Engelbrecht
The Technical University of Denmark
Anders Krogh
The Technical University of Denmark
Human genes are not continuous but rather consist of short coding
regions (exons) interspersed with highly variable non-coding regions
(introns). We apply HMMs to the problem of modeling exons, introns
and detecting splice sites in the human genome. Our most interesting
result so far is the detection of particular oscillatory patterns,
with a minimal period of roughly 10 nucleotides, that seem to be
characteristic of exon regions and may have significant biological
implications.
___________________________________________________________________________
Retrieval instructions:
The papers are "baldi.bioprimseq.ps.z" and "baldi.humgenes.ps.z".
To retrieve these files:
% ftp netcom.com
Connected to netcom.com.
220 netcom FTP server (Version 2.0WU(10) [...] ready.
Name (netcom.com:yourname): anonymous
331 Guest login ok, send your complete e-mail address as password.
Password:
..
ftp> cd pub/netid/papers
ftp> ls
ftp> binary
ftp> get <filename>
ftp> close
..
% gunzip <filename>
More information about the Connectionists
mailing list