Paper on HMMs in Bioinformatics

Dirk Husmeier dirk at bioss.ac.uk
Thu May 10 08:10:56 EDT 2001


Dear Connectionists

The following paper has just been accepted for publication in
JOURNAL OF COMPUTATIONAL BIOLOGY
and might be of interest to researchers who apply machine learning
techniques to problems in BIOINFORMATICS.

TITLE:
Detection of Recombination in DNA Multiple Alignments
with Hidden Markov Models

AUTHORS:
Dirk Husmeier and Frank Wright

PAGES: 56

DOWNLOAD FROM:
http://www.bioss.sari.ac.uk/~dirk/My_publications.html

FORMAT: PDF

SYNOPSIS
The recent advent of multiple-resistant pathogens has led to an
increased interest in interspecies recombination as an important, and
previously underestimated, source of genetic diversification in
bacteria and viruses.  The discovery of a surprisingly high frequency
of mosaic RNA sequences in HIV-1 suggests that a substantial
proportion of AIDS patients have been coinfected with HIV-1 strains
belonging to different subtypes, and that recombination between these
genomes can occur in vivo to generate new biologically active viruses.
A phylogenetic analysis of the bacterial genera Neisseria and
Streptococcus has revealed that the introduction of blocks of DNA from
penicillin-resistant non-pathogenic strains into sensitive pathogenic
strains has led to new strains that are both pathogenic and resistant.
Thus interspecies recombination raises the possibility that bacteria
and viruses can acquire biologically important traits through the
exchange and transfer of genetic material.

In the present article, a hidden Markov model (HMM) is employed to
detect recombination events in multiple alignments of DNA sequences.
The emission probabilities in a given state are determined by the
branching order (topology) and the branch lengths of the respective
phylogenetic tree, while the transition probabilities depend on the
global recombination probability.  The present study improves on an
earlier heuristic parameter optimization scheme and shows how the
branch lengths and the recombination probability can be optimized in a
maximum likelihood sense by applying the expectation maximization (EM)
algorithm.  The novel algorithm is tested on a synthetic benchmark
problem and is found to clearly outperform the earlier heuristic
approach.  The paper concludes with an application of this scheme to a
DNA sequence alignment of the argF gene from four Neisseria strains,
where a likely recombination event is clearly detected.

Best Wishes

Dirk




--

----------------------------------------------
Dirk Husmeier
Biomathematics and Statistics Scotland (BioSS)
SCRI, Invergowrie, Dundee DD2 5DA, United Kingdom
http://www.bioss.ac.uk/~dirk/




More information about the Connectionists mailing list