Per-channel energy normalization: Why and how

V Lostanlen, J Salamon, M Cartwright… - IEEE Signal …, 2018 - ieeexplore.ieee.org
In the context of automatic speech recognition and acoustic event detection, an adaptive
procedure named per-channel energy normalization (PCEN) has recently shown to …

Multimodal human action recognition in assistive human-robot interaction

I Rodomagoulakis, N Kardaris… - … on acoustics, speech …, 2016 - ieeexplore.ieee.org
Within the context of assistive robotics we develop an intelligent interface that provides
multimodal sensory processing capabilities for human action recognition. Human action is …

Localizing speakers in multiple rooms by using deep neural networks

F Vesperini, P Vecchiotti, E Principi, S Squartini… - Computer Speech & …, 2018 - Elsevier
In the field of human speech capturing systems, a fundamental role is played by the source
localization algorithms. In this paper a Speaker Localization algorithm (SLOC) based on …

Far-field audio-visual scene perception of multi-party human-robot interaction for children and adults

A Tsiami, PP Filntisis, N Efthymiou… - … , Speech and Signal …, 2018 - ieeexplore.ieee.org
Human-robot interaction (HRI) is a research area of growing interest with a multitude of
applications for both children and adult user groups, as, for example, in edutainment and …

[PDF][PDF] The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones

M Matassoni, RF Astudillo, A Katsamanis… - … Annual Conference of …, 2014 - researchgate.net
Distant speech recognition in real-world environments is still a challenging problem and a
particularly interesting topic is the investigation of multi-channel processing in case of …

[PDF][PDF] ATHENA: A Greek multi-sensory database for home automation control

A Tsiami, I Rodomagoulakis, P Giannoulis… - Cough, 2014 - isca-archive.org
In this paper we present a Greek speech database with real multi-modal data in a smart
home two-room environment. In total, 20 speakers were recorded in 240 one-minute long …

im4Things: an ontology-based natural language interface for controlling devices in the Internet of Things

JÁ Noguera-Arnaldos, MA Paredes-Valverde… - Current Trends on …, 2017 - Springer
Abstract The Internet of Things (IoT) offers opportunities for new applications and services
that enable users to access and control their working and home environment from local and …

Room-localized spoken command recognition in multi-room, multi-microphone environments

I Rodomagoulakis, A Katsamanis, G Potamianos… - Computer Speech & …, 2017 - Elsevier
The paper focuses on the design of a practical system pipeline for always-listening, far-field
spoken command recognition in everyday smart indoor environments that consist of multiple …

A multimedia gesture dataset for human robot communication: Acquisition, tools and recognition results

I Rodomagoulakis, N Kardaris… - … on Image Processing …, 2016 - ieeexplore.ieee.org
Motivated by the recent advances in human-robot interaction we present a new dataset, a
suite of tools to handle it and state-of-the-art work on visual gestures and audio commands …

The Athena-RC system for speech activity detection and speaker localization in the DIRHA smart home

P Giannoulis, A Tsiami… - 2014 4th Joint …, 2014 - ieeexplore.ieee.org
We present our system for speech activity detection and speaker localization inside a smart
home with multiple rooms equipped with microphone arrays of known geometry and …