Multi-conditioning and data augmentation using generative noise model for speech emotion recognition in noisy conditions

U Tiwari, M Soni, R Chakraborty… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Degradation due to additive noise is a significant road block in the real-life deployment of
Speech Emotion Recognition (SER) systems. Most of the previous work in this field dealt …

Towards building an automatic transcription system for language documentation: Experiences from muyu

A Zahrer, A Žgank, B Schuppler - Proceedings of the Twelfth …, 2020 - aclanthology.org
Since at least half of the world's 6000 plus languages will vanish during the 21st century,
language documentation has become a rapidly growing field in linguistics. A fundamental …

Improved language models for ASR using written language text

K Mukherji, M Pandharipande… - … National Conference on …, 2022 - ieeexplore.ieee.org
The performance of an Automatic Speech Recognition (ASR) engine primarily depends on
(a) the acoustic model (AM),(b) the language model (LM) and (c) the lexicon (Lx), While the …

[HTML][HTML] Towards NLU model robustness to ASR errors at scale

Y Nechaev, W Ruan, I Kiss - 2021 - amazon.science
In a large-scale Spoken Language Understanding system, Natural Language
Understanding (NLU) models are typically decoupled, ie, trained and updated …

[PDF][PDF] Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition

S Ranjan, R Chakraborty, SK Kopparapu - Proc. Interspeech 2024, 2024 - isca-archive.org
Speech emotion recognition (SER) is an indispensable component of any human machine
interactions, and enables building empathetic voice user interfaces. Ability to accurately …

Generalized SpecAugment: Robust Online Augmentation Technique for End-to-End Automatic Speech Recognition

M Soni, A Panda, SK Kopparapu - 2024 Asia Pacific Signal …, 2024 - ieeexplore.ieee.org
Since its introduction, SpecAugment has become a default augmentation technique in many
End-to-End Automatic Speech Recognition systems. It is computationaly efficient and …

Acoustic model adaptation in noisy and reverberated scenarios using multi-task learned embeddings

A Raikar, M Soni, A Panda… - 2024 Asia Pacific Signal …, 2024 - ieeexplore.ieee.org
Distortions, such as revebererations and background noise in an acoustic environment pose
a significant challenge for large-scale Automatic Speech Recognition (ASR) systems. In this …

Enhancing Speech In Noisy Environment: A Review

NSD Ganta, VK Mittal - 2021 IEEE Region 10 Symposium …, 2021 - ieeexplore.ieee.org
Speech signal is the primary mode of communication amongst humans. It is also used for
the communication between human and machine, ie, human-machine interaction (IoTs) …

[PDF][PDF] DE-STT: DE-ENTAGLEMENT OF UNWANTED NUISANCES AND BIASES IN SPEECH TO TEXT SYSTEM USING ADVERSARIAL FORGETTING

H Yadav, J Singh, AA Singh, R Mittal, RR Shah - CoRR, 2020 - academia.edu
ABSTRACT Training robust Speech to Text (STT) system require “tens of thousand” of hours
of data. Variability present in the dataset, in the form of unwanted nuisances (noise) and …

[PDF][PDF] Pipeline for XAI based Automatic Audio Call Audit

U Tiwari, R Chakraborty, S Divolia, SK Kopparapu - 2020 - micc.unifi.it
In this paper, we describe an industry grade, functional, automatic explainable audit system
for service desk call conversations. Often these interactions are laced by the various …