Chrome Extension
WeChat Mini Program
Use on ChatGLM

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Multimedia Tools and Applications(2024)

Cited 0|Views6
No score
Abstract
The essence of music is inherently multi-modal – with audio and lyrics going hand in hand. However, there is very less research done to study the intricacies of the multi-modal nature of music, and its relation with genres. Our work uses this multi-modality to present spectro-lyrical embeddings for music representation (SLEM), leveraging the power of open-sourced, lightweight, and state-of-the-art deep learning vision and language models to encode songs. This work summarises extensive experimentation with over 20 deep learning-based music embeddings of a self-curated and hand-labeled multi-lingual dataset of 226 recent songs spread over 5 genres. Our aim is to study the effects of varying the weight of lyrics and spectrograms in the embeddings on the multi-class genre classification. The purpose of this study is to prove that a simple linear combination of both modalities is better than either modality alone. Our methods achieve an accuracy ranging between 81.08
More
Translated text
Key words
Music,Machine learning,Spectrograms,Multimodal music embeddings,Representation learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined