Automatic Stuttering Detection and Identification


Adanova V. (Yürütücü), Gençtav A.

Yükseköğretim Kurumları Destekli Proje, BAP Araştırma Projesi, 2023 - 2024

  • Proje Türü: Yükseköğretim Kurumları Destekli Proje
  • Destek Programı: BAP Araştırma Projesi
  • Başlama Tarihi: Mart 2023
  • Bitiş Tarihi: Mart 2024

Proje Özeti

Stuttering, a.k.a. stammering, is a compiex speech disorder that negatively affects the communicatıon ability of 1% of the population. Persons who stutter (PWS) often know what they want to say, however the speech is interrupted by involuntary pauses and word or sound repetitions. Stuttering might come in different forrns such as prolongations, where the sound is stretched out; repetitions, when the sounds, words or phrases are repeated; blocks, which are the pauses or gasps for air made by the speaker. Identification of stuttering is a challenging problem involving multiple disciplines such as pathology, psychology, acoustics, and signal processing. With the advance of machine and deep learning the research done on speech domain have dramatically developed. Thus, current Automatic Speech Recognition (ASR) systems have good accuracy leading to voice assistants such as Alexa, Siri or Google. However, these systems are built based on fluent speech, and they fail to recognize speech accompanied with pauses and repetitions. Considering that voicetechnologies are becoming ubiquitous, if the developers continue assuming ideal speech scenarios, the future world seems to be the place where people with speech disorders will feel greatly deprived. For current ASR systems to be able to understand stuttered speech they should be trained on stuttered data. This approach leads us to data scarcity problem. The other approach would be to integrate to ASR systems other systems that automatically correct the speech. However, a correction system needs a good dysfluency detection and identification process. Automatic detection and identification of stuttering types will not improve only ASR systems. It will also be very useful for speech therapist. Currently the speech therapists take the audio recordings of their patients while they speak and then manually annotate the stuttering types observed in a speech. Based on the frequency of stuttering types, the severity of speech disorder is identified. The improvements in patient's speech after the therapy are also identified by the same process. In this project we aim to tackle the problem of automatic detection and identification of stuttering. The literature review (See the attached Literature Review) in this context reveals the scarcity of works done in the area. Surely, here and there some studies have been performed. However, due to the lack of data, many of these studies used in-house datasets, which are small and manually labeled. Besides, these in-house datasets are not shared.The studies on the stuttered speech have gained a speed recently as Apple released a new stuttering dataset. The data was collected from podcast where PWS are interviewed and is the first largest annotated dataset. We plan to conduct our work based on it and include some other smaller datasets. The latest works show poor performance of dysfluency type identification. Our aim is to build a system that improves identification of stuttering types. The other novelty that we consider is to find exactly where in audio the stuttering is observed. This information will be of great use for speech correction systems.