Genre classification of movies using their synopsis
Abstract
Movies have been classified into genres since the inception of the medium. However, even till this day, the process of classifying movies into genres has been a manual, time consuming task requiring human expertise. Some work has been done in trying to automatically classify movies into genres using machine learning techniques and classifiers, and some success has been achieved. However, little work has been done in this field with regards to Indian movies specifically. In this paper, multiple supervised learning algorithms including Naïve Bayes, Logistic Regression, K Nearest Neighbor, Decision Tree and Linear SVM were used to classify a set of Indian movies including but not limited to Bollywood and South Indian movies. Naïve Bayes and Logistic Regression were found to be the better performers and K-Nearest Neighbors was the worst performer. Genres with high positive examples such as ‘Drama’ were classified correctly more often and 0.7 for precision and 0.7 for recall scores was obtained. Performance degraded drastically as the number of positive examples fell with the ‘Musical’ genre having precision scores close to 0.1 and recall scores nearing 0.