Encodecmae: Leveraging neural codecs for universal audio representation learning

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Music2latent: Consistency autoencoders for latent audio compression

M Pasini, S Lattner, G Fazekas - arXiv preprint arXiv:2408.06500, 2024 - arxiv.org

Efficient audio representations in a compressed continuous latent space are critical for
generative audio modeling and Music Information Retrieval (MIR) tasks. However, some …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Learning music representations with wav2vec 2.0

A Ragano, E Benetos, A Hines - 2023 31st Irish Conference on …, 2023 - ieeexplore.ieee.org

Learning music representations that are general-purpose offers the flexibility to finetune
several downstream tasks using smaller datasets. The wav2vec 2.0 speech representation …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems

YM Tamm, A Aljanaki - Proceedings of the 18th ACM Conference on …, 2024 - dl.acm.org

Over the years, Music Information Retrieval (MIR) has proposed various models pretrained
on large amounts of music data. Transfer learning showcases the proven effectiveness of …

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio

P Alonso-Jiménez, L Pepino, R Batlle-Roca… - arXiv preprint arXiv …, 2024 - arxiv.org

We present PECMAE, an interpretable model for music audio classification based on
prototype learning. Our model is based on a previous method, APNet, which jointly learns an …

被引用次数：4 相关文章所有 3 个版本

一般の音を学習する音響信号表現の最前線

仁泉大輔 - 日本音響学会誌, 2024 - jstage.jst.go.jp

私達の身の回りには, 人の話し声や音楽のように能動的に関わる音以外にも,
街や自然の環境音など受動的な周囲状況の把握に欠かせない音で溢れている …

高级搜索

QQ 群