Thesis/TR: Sensorimotor foundations of phonology -- a model.

Kevin Markey markey at dendrite.cs.colorado.edu
Tue May 2 01:40:03 EDT 1995


FTP-host: archive.cis.ohio-state.edu
FTP-filename: /pub/neuroprose/Thesis/markey.thesis.ps.Z

             Ph.D. Thesis available by anonymous ftp (128 pages)

                  The Sensorimotor Foundations of Phonology:
                   A Computational Model of Early Childhood
                    Articulatory and Phonetic Development

                               Kevin L. Markey
                        Department of Computer Science
                       University of Colorado at Boulder

                                   ABSTRACT

This thesis describes HABLAR, a computational model of the sensorimotor
foundations of early childhood phonological development.  HABLAR is intended
to replicate the major milestones of emerging speech and demonstrate key
characteristics of normal development, including the phonetic characteristics
of babble, systematic and context-sensitive patterns of sound substitutions
and deletions, overgeneralization errors, and the emergence of adult phonemic
organization.

HABLAR simulates a complete sensorimotor system consisting of an auditory
system that detects and categorizes speech sounds using only acoustic cues
drawn from its linguistic environment, an articulatory system that generates
synthetic speech based on a realistic computer model of the vocal tract, and a
hierarchical cognitive architecture that bridges the two.  The environment in
which the model resides is also simulated.  The model is an autonomous agent
which actively experiments within this environment.

The principal hypothesis guiding the model is that phonological development
emerges from the interaction of auditory perception and hierarchical motor
control.  The model's auditory perception is specialized to segment and
categorize acoustic signals into discrete phonetic events which closely
correspond to discrete sets of functionally coordinated gestures learned by
the model's articulatory control apparatus.  HABLAR learns the correspondence
between discrete phonetic and articulatory events, not between continuous
speech and continuous vocal tract motion.

HABLAR's perceptual and motor organization is initially syllabic.  Phonemes
are not built into the model but emerge (along with an adult-like phonological
organization) due to the differentiation of early syllable-sized motor
patterns into phoneme-sized patterns while the model learns a large lexicon.

Learning occurs in two phases.  In the first phase, HABLAR's auditory
perception employs soft competitive learning to acquire phonetic features
which categorize the spectral properties of utterances in the linguistic
environment.  In the second phase, reinforcement based on the phonetic
proximity of target and actual utterances guides learning by the model's two
levels of motor control.  The phonological control level uses Q-learning to
learn an optimal policy linking phonetic and articulatory events.  The
articulatory control level employs a parallel Q-learning architecture to learn
a policy which controls the vocal tract's twelve degrees-of-freedom.

HABLAR has been fully implemented as a computational model.  Simulations of
the model's auditory perception demonstrate that it faithfully preserves and
makes explicit phonetic properties of the acoustic signal.  Auditory
simulations also mimic categorical vowel and consonant perception which
develops in human infancy.  Other results demonstrate the feasibility of
learning multi-dimensional articulatory control with a parallel reinforcement
learning architecture, and the effectiveness of shaping motor control with
reinforcement based on the phonetic proximity of target and actual utterances.

The model provides qualitative accounts of developmental data.  It is
predicted to make pronunciation errors similar to those observed among
children because of the relative articulatory difficulty of its producing
different speech sounds, its tendency to eliminate the biggest phonetic errors
first, its generalization of already mastered sounds across phonetic
similarities, and contextual effects of phonetic representations and internal
distributed representations which underlie speech production.
-----------------------------------------------------------------------------
Sorry, hard copies are not available.

Thanks to Jordan Pollack for maintaining neuroprose.

Kevin L. Markey
Department of Psychology
2155 S. Race Street
University of Denver
Denver, CO  80208
markey at cs.colorado.edu
------------------------------------------------------------------------------



More information about the Connectionists mailing list