Tech report on location-based segregation

Wed Jul 17 14:38:51 EDT 2002

Dear Colleagues,

It is my pleasure to announce the availability of the following
technical report.

Thanks for your attention,

Nicoleta Roman

************************************
"Speech segregation based on sound localization", Technical Report #16,
June 2002.

Department of Computer and Information Science
The Ohio State University

Nicoleta Roman, The Ohio State University
DeLiang Wang, The Ohio State University
Guy J. Brown, University of Sheffield
*************************************

Abstract
---------
At a cocktail party, we can selectively attend to a single voice and
filter out all the other acoustical interferences. How to simulate this
perceptual ability remains a great challenge. This paper describes a
novel machine learning approach to speech segregation, in which a target

speech signal is separated from interfering sounds using spatial
location cues: interaural time differences (ITD) and interaural
intensity differences (IID). The auditory masking effect motivates the
notion of an “ideal” time-frequency binary mask, which selects the
target if it is stronger than the interference in a local time-frequency

(T-F) unit. We observe that within a narrow frequency band,
modifications to the relative strength of the target source with respect

to the interference trigger systematic deviations for ITD and IID. For a

given spatial configuration, this interaction produces characteristic
clustering in the binaural feature space. Consequently, we perform
pattern classification in order to estimate ideal binary masks. A
systematic evaluation shows that the resulting system produces masks
very close to ideal binary ones, and gives a significant improvement in
performance over an existing approach, as quantified by changes in
signal-to-noise ratio before and after segregation.
**************************************

The manuscript is available for download at:

    ftp://ftp.cis.ohio-state.edu/pub/tech-report/2002/TR16.pdf

Related sound demos can be found at:

    http://www.cis.ohio-state.edu/~niki/soundemo.html

A preliminary version of this work is included in the Proceedings of
2002 ICASSP.