Movie Trailer Scene Classification Based on Audio VGGish Features

2022 International Conference on Machine Learning, Control, and Robotics (MLCR)(2022)

Cited 0|Views1
No score
Abstract
In a movie trailer, sound carries important information about the background music or sound effects thus, using these data to classify the genre of a movie can help post-production teams in the movie industry. Since deep learning has shown great potential and usefulness in many machine learning applications, this paper studied deep learning methods and building suitable neural networks for this movie trailer genre classification task. To train and evaluate the performance of our neural networks, we used MovieLens20M dataset. Instead of providing original audio files, AudioSet offers 128-dimensional embeddings outputted by a VGG model for audio with a frame length of 960ms, approximately 1s. Thus, each audio from is represented as a series of 128-dimensional features. Then we finetuned the VGG model that AudioSet used as a feature extractor. Afterward, we compared Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) models with same hyper-parameters.
More
Translated text
Key words
movie trailer,movie genre classification,neural network,vggish,audio features
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined