Connectionists: Title : PhD Position: Multimodal Detection of Stuttering Disfluencies
Shakeel Ahmad
shakeelzmail608 at gmail.com
Thu Oct 30 15:14:26 EDT 2025
*1. Introduction*
Stuttering, a fluency disorder affecting millions of individuals, is
characterized by stuttering-like disfluencies (blocks, prolongations,
repetitions) linked to dysfunctions in speech motor control. While its
automatic detection has already been explored using audio-based models,
current systems remain limited by low robustness, difficulty in identifying
certain disfluencies such as silent blocks, and reliance on scarce data.
This PhD project proposes a multimodal approach (audio, video, text) to
enhance the accuracy and robustness of disfluency detection, leveraging an
audiovisual corpus of French-speaking individuals who stutter. The analysis
will rely on modality-specific encoding techniques, followed by a strategic
fusion of their representations for final classification.
*2. Aims*
The aim of this PhD is to design, develop, and evaluate a multimodal deep
learning approach for the automatic detection of stuttering-like
disfluencies in French, by combining audio, video, and textual modalities.
The work will be based on an annotated audiovisual corpus of
French-speaking people who stutter, with particular focus on disfluencies
that are difficult to detect through audio alone, such as silent blocks,
and on robustness to individual variability.
The doctoral candidate’s work will include the following tasks:
- *Audio encoding*: Implement and adapt Stutternet (Sheikh, S. A.,
Sahidullah, M., Hirsch, F., & Ouni, S. – 2021 – *Stutternet: Stuttering
detection using time delay neural network*, in EUSIPCO) to extract
acoustic features relevant to disfluency detection by capturing temporal
dependencies.
- *Video encoding*: Develop and train vision models (e.g., C3D or
Transformers) to analyze video sequences for visual cues of stuttering
(facial tension, blinking, atypical movements). The extraction of facial
landmarks (with OpenFace or MediaPipe) will also be explored as a
complementary or alternative source of features.
- *Text encoding*: Generate automatic transcriptions (via Whisper) and
encode them using pre-trained language models (BERT, RoBERTa) to extract
linguistic context and identify textual patterns characteristic of
disfluencies.
- *Multimodal fusion*: Implement and compare several strategies to fuse
the representations from the three modalities, such as concatenation,
adaptive attention mechanisms, or other approaches leveraging data
complementarity.
- *Classification and evaluation*: Develop a classifier operating on the
fused representation to predict the presence or absence of stuttering
within a given time window. Evaluation will rely on standard metrics
(precision, recall, F1-score, AUC), and results will be compared to manual
expert annotations. Qualitative analyses will also be conducted to
interpret model errors and refine the approach.
Beyond detection, this PhD aims to contribute methodologically to the field
of multimodal fusion applied to pathological speech, with the potential
impact in clinical contexts.
*T*he PhD will be mainly carried out at LORIA/INRIA in Nancy, France, with
occasional short stays (from one week to one month) at Parxiling in
Montpellier, France.
*3. Required Skills*
The candidate should hold a Master’s degree in computer science, have
strong skills in machine learning and deep learning, and be proficient in
Python and frameworks such as PyTorch or TensorFlow. An interest in signal
processing (audio/video) and ideally in NLP is expected. Autonomy, rigor,
critical thinking, and analytical abilities are essential, along with
strong communication skills to work in a multidisciplinary environment. An
interest in phonetics, linguistics, and speech disorders—particularly
stuttering—would be a plus.
*To apply:* please send me (shakeelzmail608 at gmail.com, slim.ouni at loria.fr
<slim.ouni at loria.fr>) : your CV, transcripts from your previous years of
study, a motivation letter, and your Master’s thesis manuscript.
--
Kind Regards,
Dr. Shakeel A. Sheikh
Research Scientist
Novartis AG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20251030/28e62226/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PhDPosition.pdf
Type: application/pdf
Size: 185661 bytes
Desc: not available
URL: <http://mailman.srv.cs.cmu.edu/pipermail/connectionists/attachments/20251030/28e62226/attachment.pdf>
More information about the Connectionists
mailing list