Long-form music generation with latent diffusion

Z Evans, JD Parker, CJ Carr, Z Zukowski… - arXiv preprint arXiv …, 2024 - arxiv.org
Audio-based generative models for music have seen great strides recently, but so far have
not managed to produce full-length music tracks with coherent musical structure. We show …

Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization

X Gao, C Zhang, Y Chen, H Zhang, NF Chen - arXiv preprint arXiv …, 2024 - arxiv.org
Current emotional text-to-speech (TTS) models predominantly conduct supervised training
to learn the conversion from text and desired emotion to its emotional speech, focusing on a …

MusicScore: A Dataset for Music Score Modeling and Generation

Y Lin, Z Dai, Q Kong - arXiv preprint arXiv:2406.11462, 2024 - arxiv.org
Music scores are written representations of music and contain rich information about musical
components. The visual information on music scores includes notes, rests, staff lines, clefs …

Crafting Creative Melodies: A User-Centric Approach for Symbolic Music Generation

S Dadman, BA Bremdal - Electronics, 2024 - mdpi.com
Composing coherent and structured music is one of the main challenges in symbolic music
generation. Our research aims to propose a user-centric framework design that promotes a …

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

C Chen, Y Hu, W Wu, H Wang, ES Chng… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, text-to-speech (TTS) technology has witnessed impressive advancements,
particularly with large-scale training datasets, showcasing human-level speech quality and …

MAD Speech: Measures of Acoustic Diversity of Speech

M Futeral, A Agostinelli, M Tagliasacchi… - arXiv preprint arXiv …, 2024 - arxiv.org
Generative spoken language models produce speech in a wide range of voices, prosody,
and recording conditions, seemingly approaching the diversity of natural speech. However …

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Y Bai, H Chen, J Chen, Z Chen, Y Deng… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce Seed-Music, a suite of music generation systems capable of producing high-
quality music with fine-grained style control. Our unified framework leverages both auto …

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation

X Di, Z Chen, Y Liang, J Zheng, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large-scale text-to-speech (TTS) models have made significant progress recently. However,
they still fall short in the generation of Chinese dialectal speech. Toaddress this, we propose …

Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment

NK Corrêa - arXiv preprint arXiv:2406.11039, 2024 - arxiv.org
The critical inquiry pervading the realm of Philosophy, and perhaps extending its influence
across all Humanities disciplines, revolves around the intricacies of morality and normativity …

Human-Machine Co-creation in an Electroacoustic Ensemble Performance Context

RD Hoy, D Van Nort - Proceedings of the 19th International Audio Mostly …, 2024 - dl.acm.org
This paper investigates the outcomes of incorporating computational agents into a telematic
electroacoustic performance ensemble. The ensemble is directed by a gestural language …