Connectionists: Postdoc position at IRISA, Rennes, France

Emmanuel Vincent emmanuel.vincent at irisa.fr
Wed Feb 7 09:24:56 EST 2007


Dear list,

We are seeking to recruit a postdoctoral researcher on the statistical 
modeling of multichannel audio, applied to speaker segmentation and 
separation (full subject below). The successful candidate will work 
under the supervision of Drs. Guillaume Gravier and Emmanuel Vincent, in 
the METISS group at IRISA, which possesses a newly-equipped room 
dedicated to the exploration of future meeting environments.

Prospective candidates should have a background in multichannel signal 
processing or in speech processing and hold a PhD for less than one year 
or being about to obtain one. Informal enquiries may be made to Emmanuel 
Vincent (emmanuel.vincent at irisa.fr) or Guillaume Gravier 
(guillaume.gravier at irisa.fr).

This appointment is for 2 years, starting summer or fall 2007. Salary 
will be at 28000 euros per annum. Applications must be submitted online 
before march 31st at
http://www.inria.fr/travailler/opportunites/postdoc/postdoc.en.html




Joint statistical modeling of spectral, temporal and spatial audio 
features, applied to speaker segmentation and separation

Most audio signals represent complex sound scenes consisting of several 
overlapping sources (speakers, natural sounds, musical instruments). 
These sources are usually located at different spatial positions and 
exhibit different spectro-temporal characteristics. The processing of 
such documents involves several challenging tasks, such as the 
separation, the segmentation and more generally the description of each 
source.

Existing description algorithms are mostly designed for one-microphone 
recordings and rely on statistical modeling of spectral features. Yet, 
in many application environments, multiple microphones are available 
thus providing valuable spatial information. Beamforming algorithms are 
then typically employed to determine at each instant the number of 
sources and their locations based on spatial features. These algorithms 
can improve the detection of overlapping sources. However their 
robustness decreases for small microphone arrays or with moving sources.

The goal of this project is to define a unified statistical modeling 
framework for the joint exploitation of spectral, temporal and spatial 
information in multichannel audio signals. Dynamic state-based models 
offer a promising approach for the description of some extracted 
spectral and spatial features as a function of some hidden states 
associated with different sources and positions. A first stage of the 
project could consist of extending the state-of-the-art one-microphone 
segmentation model developed in our lab (based on GMMs) by incorporating 
spatial features obtained from classical source localization and 
separation techniques (e.g. ICA, DUET, beamforming).

The proposed framework will be primarily applied to speaker segmentation 
and separation, which is the task of finding out the structure of a 
speech recording according to the question "who spoke when and where" 
and to extract the signal of each speaker. The results will be evaluated 
on meeting data recorded by small microphone arrays. Data from the NIST 
meeting evaluation will be used along with data recorded at our lab in a 
room dedicated to the exploration of future meeting environments.

-- 
Emmanuel Vincent
METISS Project
IRISA-INRIA
Campus de Beaulieu, 35042 Rennes cedex, France
Phone: +332 9984 7227 - Fax: +332 9984 7171
Web: http://www.irisa.fr/metiss/members/evincent/


More information about the Connectionists mailing list