HMMs and Molecular Biology

Yves Chauvin yves at netid.com
Fri Jan 28 14:43:02 EST 1994



                  **DO NOT FORWARD TO OTHER GROUPS**


The following papers,

"Hidden Markov Models of Biological Primary Sequence Information",
to be published in the Proceedings of the National Academy of 
Sciences (USA), vol. 91, February 94.

and 

"Hidden Markov Models for Human Genes",
to be published in the Proceedings of the 1993 NIPS conference, vol.
6. 

have been placed on ftp site.

Further information and retrieval instructions are given below.

Yves Chauvin
yves at netid.com

___________________________________________________________________________


                         Hidden Markov Models of 
                 Biological Primary Sequence Information

                              Pierre Baldi 
                        Jet Propulsion Laboratory
                         and Division of Biology, 
                   California Institute of Technology
                           Pasadena, CA 91109

                              Yves Chauvin 
                              Net-ID, Inc.

                             Tim Hunkapiller
                         University of Washington

                           Marcella A. McClure
                          University of Nevada


Hidden Markov Model (HMM) techniques are used to model families of 
biological sequences.  A smooth and convergent algorithm is introduced
to iteratively adapt the transition and emission parameters of the models
from the examples in a given family. The HMM approach is applied
to three protein families: globins, immunoglobulins and kinases. 
In all cases, the models derived capture the important statistical 
characteristic of the family and can be used for a number of tasks 
including: multiple alignments, motif detection and classification.
For $K$ sequences of average length $N$, this approach yields an 
effective multiple alignment algorithm which requires $O(KN^2)$ 
operations, linear in the number of sequences.


___________________________________________________________________________


                  Hidden Markov Models for Human Genes


                              Pierre Baldi 
                        Jet Propulsion Laboratory
                         and Division of Biology, 
                   California Institute of Technology
                           Pasadena, CA 91109

                              Soren Brunak
                  The Technical University of Denmark

                              Yves Chauvin 
                              Net-ID, Inc.

                           Jacob Engelbrecht
                  The Technical University of Denmark

                              Anders Krogh
                  The Technical University of Denmark


Human genes are not continuous but rather consist of short coding 
regions (exons) interspersed with highly variable non-coding regions 
(introns).  We apply HMMs to the problem of modeling exons, introns
and detecting splice sites in the human genome.  Our most interesting 
result so far is the detection of particular oscillatory patterns, 
with a minimal period of roughly 10 nucleotides, that seem to be 
characteristic of exon regions and may have significant biological 
implications.

___________________________________________________________________________


Retrieval instructions:

The papers are "baldi.bioprimseq.ps.z" and "baldi.humgenes.ps.z".
To retrieve these files:

% ftp netcom.com
Connected to netcom.com.
220 netcom FTP server (Version 2.0WU(10) [...] ready.
Name (netcom.com:yourname): anonymous
331 Guest login ok, send your complete e-mail address as password.
Password:
..
ftp> cd pub/netid/papers
ftp> ls
ftp> binary
ftp> get <filename>
ftp> close
..

% gunzip <filename>


More information about the Connectionists mailing list