WavChat: A Survey of Spoken Dialogue Models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Music2latent: Consistency autoencoders for latent audio compression

M Pasini, S Lattner, G Fazekas - arXiv preprint arXiv:2408.06500, 2024 - arxiv.org
Efficient audio representations in a compressed continuous latent space are critical for
generative audio modeling and Music Information Retrieval (MIR) tasks. However, some …

Learning music representations with wav2vec 2.0

A Ragano, E Benetos, A Hines - 2023 31st Irish Conference on …, 2023 - ieeexplore.ieee.org
Learning music representations that are general-purpose offers the flexibility to finetune
several downstream tasks using smaller datasets. The wav2vec 2.0 speech representation …

Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems

YM Tamm, A Aljanaki - Proceedings of the 18th ACM Conference on …, 2024 - dl.acm.org
Over the years, Music Information Retrieval (MIR) has proposed various models pretrained
on large amounts of music data. Transfer learning showcases the proven effectiveness of …

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio

P Alonso-Jiménez, L Pepino, R Batlle-Roca… - arXiv preprint arXiv …, 2024 - arxiv.org
We present PECMAE, an interpretable model for music audio classification based on
prototype learning. Our model is based on a previous method, APNet, which jointly learns an …

一般の音を学習する音響信号表現の最前線

仁泉大輔 - 日本音響学会誌, 2024 - jstage.jst.go.jp
私達の身の回りには, 人の話し声や音楽のように能動的に関わる音以外にも,
街や自然の環境音など受動的な周囲状況の把握に欠かせない音で溢れている …