Connectionists: Ph.D. dissertation announcement: Sound Source Segregation

Nicoleta Roman niki at cse.ohio-state.edu
Thu Dec 15 12:42:21 EST 2005


Dear list members:


I would like to bring to your attention my recently completed Ph.D. 
dissertation, entitled "Auditory-based algorithms for sound segregation 
in multisource and reverberant environments".

An electronic version of the thesis is available at:

http://www.ohiolink.edu/etd/view.cgi?osu1124370749

Please find the abstract below.


Sincerely,
Nicoleta Roman


--------
ABSTRACT
--------
At a cocktail party, we can selectively attend to a single voice and 
filter out other interferences. This perceptual ability has motivated a 
new field of study known as computational auditory scene analysis (CASA) 
which aims to build speech separation systems that incorporate auditory 
principles. The psychological process of figure-ground segregation 
suggests that the target signal should be segregated as foreground while 
the remaining stimuli are treated as background. Accordingly, the 
computational goal of CASA should be to estimate an ideal time-frequency 
(T-F) binary mask, which selects the target if it is stronger than the 
interference in a local T-F unit. This dissertation investigates four 
aspects of CASA processing: location-based speech segregation, binaural 
tracking of multiple moving sources, binaural sound segregation in 
reverberation, and monaural segregation of reverberant speech. For 
localization, the auditory system utilizes the interaural time 
difference (ITD) and interaural intensity difference (IID) between the 
ears. We observe that within a narrow frequency band, modifications to 
the relative strength of the target source with respect to the 
interference trigger systematic changes for ITD and IID resulting in a 
characteristic clustering. Consequently, we propose a supervised 
learning approach to estimate the ideal binary mask. A systematic 
evaluation shows that the resulting system produces masks very close to 
the ideal binary ones and large speech intelligibility improvements. In 
realistic environments, source motion requires consideration. Binaural 
cues are strongly correlated with locations in T-F units dominated by 
one source resulting in channel-dependent conditional probabilities. 
Consequently, we propose a multi-channel integration method of these 
probabilities in order to compute the likelihood function in a target 
space. Finally, a hidden Markov model is employed for forming continuous 
tracks and automatically detecting the number of active sources. 
Reverberation affects the ITD and IID cues. We therefore propose a 
binaural segregation system that combines target cancellation through 
adaptive filtering and a binary decision rule to estimate the ideal 
binary mask. A major advantage of the proposed system is that it imposes 
no restrictions on the interfering sources. Quantitative evaluations 
show that our system outperforms related beamforming approaches. 
Psychoacoustic evidence suggests that monaural processing play a vital 
role in segregation. It is known that reverberation smears the 
harmonicity of speech signals. We therefore propose a two-stage 
separation system that combines inverse filtering of target room impulse 
response with pitch-based segregation. As a result of the first stage, 
the harmonicity of a signal arriving from target direction is partially 
restored while signals arriving from other locations are further 
smeared, and this leads to improved segregation and considerable 
signal-to-noise ratio gains.

--------------
--------------



More information about the Connectionists mailing list