Start Over

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Authors :: Hsiao, Wen-Yi
Liu, Jen-Yu
Yeh, Yin-Cheng
Yang, Yi-Hsuan
Source :: Proceedings of the AAAI Conference on Artificial Intelligence. 35:178-186
Publication Year :: 2021
Publisher :: Association for the Advancement of Artificial Intelligence (AAAI), 2021.
Abstract: To apply neural sequence models such as the Transformers to music generation tasks, one has to represent a piece of music by a sequence of tokens drawn from a finite set of pre-defined vocabulary. Such a vocabulary usually involves tokens of various types. For example, to describe a musical note, one needs separate tokens to indicate the note’s pitch, duration, velocity (dynamics), and placement (onset time) along the time grid. While different types of tokens may possess different properties, existing models usually treat them equally, in the same way as modeling words in natural languages. In this paper, we present a conceptually different approach that explicitly takes into account the type of the tokens, such as note types and metric types. And, we propose a new Transformer decoder architecture that uses different feed-forward heads to model tokens of different types. With an expansion-compression trick, we convert a piece of music to a sequence of compound words by grouping neighboring tokens, greatly reducing the length of the token sequences. We show that the resulting model can be viewed as a learner over dynamic directed hypergraphs. And, we employ it to learn to compose expressive Pop piano music of full-song length (involving up to 10K individual tokens per song), both conditionally and unconditionally. Our experiment shows that, compared to state-of-the-art models, the proposed model converges 5 to 10 times faster at training (i.e., within a day on a single GPU with 11 GB memory), and with comparable quality in the generated music

Subjects :: FOS: Computer and information sciences
Sound (cs.SD)
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
General Medicine
Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing

Details

ISSN :: 23743468 and 21595399
Volume :: 35
Database :: OpenAIRE
Journal :: Proceedings of the AAAI Conference on Artificial Intelligence
Accession number :: edsair.doi.dedup.....22715b332a5c85f08834b3b95f7a7420

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources