ESPnet: End-to-end speech processing toolkit S Watanabe, T Hori, S Karita, T Hayashi, J Nishitoba, Y Unno, NEY Soplin, ... arXiv preprint arXiv:1804.00015, 2018 | 1561 | 2018 |
Hybrid CTC/Attention Architecture for End-to-End Speech Recognition S Watanabe, T Hori, S Kim, JR Hershey, T Hayashi IEEE Journal of Selected Topics in Signal Processing 11 (8), 1240-1253, 2017 | 883 | 2017 |
A comparative study on Transformer vs RNN in speech applications S Karita, N Chen, T Hayashi, T Hori, H Inaguma, Z Jiang, M Someki, ... arXiv preprint arXiv:1909.06317, 2019 | 815 | 2019 |
Speaker-dependent wavenet vocoder. A Tamamori, T Hayashi, K Kobayashi, K Takeda, T Toda Interspeech 2017, 1118-1122, 2017 | 334 | 2017 |
Recent developments on espnet toolkit boosted by conformer P Guo, F Boyer, X Chang, T Hayashi, Y Higuchi, H Inaguma, N Kamo, C Li, ... ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 283 | 2021 |
Espnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit T Hayashi, R Yamamoto, K Inoue, T Yoshimura, S Watanabe, T Toda, ... ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 220 | 2020 |
ESPnet-ST: All-in-one speech translation toolkit H Inaguma, S Kiyono, K Duh, S Karita, NEY Soplin, T Hayashi, ... arXiv preprint arXiv:2004.10234, 2020 | 164 | 2020 |
Exploring multi-channel features for denoising-autoencoder-based speech enhancement S Araki, T Hayashi, M Delcroix, M Fujimoto, K Takeda, T Nakatani 2015 IEEE International Conference on Acoustics, Speech and Signal …, 2015 | 131 | 2015 |
Back-translation-style data augmentation for end-to-end ASR T Hayashi, S Watanabe, Y Zhang, T Toda, T Hori, R Astudillo, K Takeda 2018 IEEE Spoken Language Technology Workshop (SLT), 426-433, 2018 | 122 | 2018 |
An investigation of multi-speaker training for WaveNet vocoder T Hayashi, A Tamamori, K Kobayashi, K Takeda, T Toda 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2017 | 120 | 2017 |
Duration-controlled LSTM for polyphonic sound event detection T Hayashi, S Watanabe, T Toda, T Hori, J Le Roux, K Takeda IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 25 …, 2017 | 111 | 2017 |
Statistical Voice Conversion with WaveNet-Based Waveform Generation. K Kobayashi, T Hayashi, A Tamamori, T Toda Interspeech, 1138-1142, 2017 | 102 | 2017 |
Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining WC Huang, T Hayashi, YC Wu, H Kameoka, T Toda arXiv preprint arXiv:1912.06813, 2019 | 101 | 2019 |
Cycle-consistency training for end-to-end speech recognition T Hori, R Astudillo, T Hayashi, Y Zhang, S Watanabe, J Le Roux ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 96 | 2019 |
Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis T Hayashi, S Watanabe, T Toda, K Takeda, S Toshniwal, K Livescu Proc. Interspeech 2019, 4430-4434, 2019 | 89 | 2019 |
Weakly-supervised sound event detection with self-attention K Miyazaki, T Komatsu, T Hayashi, S Watanabe, T Toda, K Takeda ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 87 | 2020 |
Multi-channel speech recognition: LSTMs all the way through H Erdogan, T Hayashi, JR Hershey, T Hori, C Hori, WN Hsu, S Kim, ... CHiME-4 workshop, 1-4, 2016 | 86 | 2016 |
Non-parallel voice conversion with cyclic variational autoencoder PL Tobing, YC Wu, T Hayashi, K Kobayashi, T Toda arXiv preprint arXiv:1907.10185, 2019 | 85 | 2019 |
Convolution-augmented transformer for semi-supervised sound event detection K Miyazaki, T Komatsu, T Hayashi, S Watanabe, T Toda, K Takeda Proc. workshop detection classification Acoust. Scenes events (DCASE), 100-104, 2020 | 84 | 2020 |
ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration C Li, J Shi, W Zhang, AS Subramanian, X Chang, N Kamo, M Hira, ... 2021 IEEE Spoken Language Technology Workshop (SLT), 785-792, 2021 | 79 | 2021 |