Abstract

In recent years, deep learning has advanced the MIDI domain, solidifying music generation as a key application of artificial intelligence. However, most research focuses on Western music, facing challenges in generating Chinese traditional melodies, particularly in capturing modal characteristics and emotional expression. To address this, we propose the Dual-Feature Modeling Module, which integrates the long-range modeling of the Mamba Block with the global structure capturing of the Transformer Block. Additionally, we introduce the Bidirectional Mamba Fusion Layer, which integrates local details and global structures through bidirectional scanning, enhancing sequence modeling. Building on this, we propose the REMI-M representation to better capture and generate modal information in melodies. To support this, we developed FolkDB, a high-quality Chinese traditional music dataset covering over 11 hours of music. Experimental results show our architecture excels in generating melodies with Chinese traditional music characteristics, offering a new solution for music generation.

Demo

Midi Samples

To ensure consistent duration, we have directly trimmed some audio files, which may result in abrupt endings in some parts.

ID Primer Ground Truth MusicTransformer MelodyT5 MusicMamba
001
002
003
004

Citation

If you find this work helpful and use our code in your research, please kindly cite our paper:

@article{MusicMamba,
title={MusicMamba: A Dual-Feature Modeling Approach for Generating Chinese Traditional Music with Modal Precision},
author={Jiatao Chen and Xing Tang and Tianming Xie and Jing Wang and Wenjing Dong and Bing Shi}, year={2024},
eprint={2409.02421},
archivePrefix={arXiv},
}