Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

Machine learning–based cyber attacks targeting on controlled information: A survey

Y Miao, C Chen, L Pan, QL Han, J Zhang… - ACM Computing Surveys …, 2021 - dl.acm.org
Stealing attack against controlled information, along with the increasing number of
information leakage incidents, has become an emerging cyber security threat in recent …

Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques

T Tuncer, S Dogan, UR Acharya - Knowledge-Based Systems, 2021 - Elsevier
Speech emotion recognition is one of the challenging research issues in the knowledge-
based system and various methods have been recommended to reach high classification …

Personal voice assistant security and privacy—a survey

P Cheng, U Roedig - Proceedings of the IEEE, 2022 - ieeexplore.ieee.org
Personal voice assistants (PVAs) are increasingly used as interfaces to digital environments.
Voice commands are used to interact with phones, smart homes, or cars. In the United …

The VoicePrivacy 2024 Challenge Evaluation Plan

N Tomashenko, X Miao, P Champion, S Meyer… - arXiv preprint arXiv …, 2024 - arxiv.org
The task of the challenge is to develop a voice anonymization system for speech data which
conceals the speaker's voice identity while protecting linguistic content and emotional states …

Exploration of efficient end-to-end asr using discretized input from self-supervised learning

X Chang, B Yan, Y Fujita, T Maekaku… - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised learning (SSL) of speech has shown impressive results in speech-related
tasks, particularly in automatic speech recognition (ASR). While most methods employ the …

Optimal feature selection based speech emotion recognition using two‐stream deep convolutional neural network

Mustaqeem, S Kwon - International Journal of Intelligent …, 2021 - Wiley Online Library
Speech signal processing is an active area of research, the most dominant source of
exchanging information among human beings, and the best way for human–computer …

Introducing the VoicePrivacy initiative

N Tomashenko, BML Srivastava, X Wang… - arXiv preprint arXiv …, 2020 - arxiv.org
The VoicePrivacy initiative aims to promote the development of privacy preservation tools for
speech technology by gathering a new community to define the tasks of interest and the …

Privacy implications of voice and speech analysis–information disclosure by inference

JL Kröger, OHM Lutz, P Raschke - … Management. Data for Better Living: AI …, 2020 - Springer
Internet-connected devices, such as smartphones, smartwatches, and laptops, have become
ubiquitous in modern life, reaching ever deeper into our private spheres. Among the sensors …

Speaker anonymisation using the McAdams coefficient

J Patino, N Tomashenko, M Todisco, A Nautsch… - arXiv preprint arXiv …, 2020 - arxiv.org
Anonymisation has the goal of manipulating speech signals in order to degrade the
reliability of automatic approaches to speaker recognition, while preserving other aspects of …