Journal of Engineering Technology (ISSN. 0747-9964) Volume 8, Issue 1, Jan. 2019, PP.506-515 Multimodal Music Mood Classification Framework for Christian Kokborok Music Sanchali Das 1*, Sambit Satpathy 2, Swapan Debbarma 3 123 Department of Computer Science and Engineering, National Institute of Technology, Agartala, India. *Corresponding Author Abstract: This article describes one of the applications of Music information retrieval (MIR) integrated with natural language processing. We are working on one of the poor resourced languages. The proposed work represents one of the applications of MIR that is music mood classification of one of the North-eastern regional language which is Kokborok music. It is widely spoken in the states of North East (NE) India and many other countries like Nepal, Bhutan, Myanmar and Bangladesh. The selection of the song is very specific to Christian Kokborok songs and Christianity deeply related to the Bible which has written in the recognised Romanized language, and it is accepted worldwide. We develop the multimodal corpus for audio and lyrics for Kokborok song and performed coarse-grained annotation to create mood annotated dataset and then perform classification task on both audio and lyrics separately. We projected mood taxonomy for Christian Kokborok songs and set a mood annotated corpus with the corresponding taxonomy. Initially, we used 48 parameters for audio classification and six Text stylistic feature for lyrics based classification. The SVM classifier is used with linear kernel function for classification. Finally, Mood classification system was developed for Kokborok song consist of three different systems based on audio, lyrics and multimodal (audio and lyrics together). We also compared different classifier used to get the system performance for the above three systems. We achieved 95% accuracy for audio, 97% for lyrics and multimodal system, and the accuracy rate is about 96%. Keywords: Kokborok Christian Song, Multimodal Mood Classification, Music Information Retrieval, Natural Language Processing, Weka. 1. Introduction The present work is about one of the MIR research application along with Natural language processing techniques [1, 7-11, 14, 15, 17]. In work, we choose the music mood classification task as an application of MIR. We created dataset comprised of 300 songs of Christian Kokborok music along with their corresponding lyrics. Then we created suitable mood taxonomy for the database. As for our knowledge, there is no mood annotated dataset for Kokborok is available, so annotation is done manually to create a mood annotated dataset which is used as a ground truth set for the classification task. We then perform mood classification task on audio files and lyrics database separately and together also (multimodal classification). Maximum researchers had worked on audio and lyrics classification on western music and explore between the difference between Hindi and English [10,16,17], and some of the researchers have used Indian languages like Hindi for the mood classification task [7-11]. There is a decidedly less work done on any regional languages like classical for mood classification [14, 15, 22]. It has been seen that western language and some specific language is used for MIR field 506
whereas poor resourced language and dialects are deprived so we tried to do some fundamental work that can be extended and help the researcher for Kokborok community. We choose a regional language which is Kokborok and widely spoken in the north-eastern states of India and other countries like Bangladesh and Myanmar, Nepal, Bhutan too. The Christian community has intensive analysis on Christianity in Kokborok people of Tripura in the era of 1932 to 1988 by New Zealand Baptist community and about 50 years the Christian community in Kokborok people spread in Tripura. As of 2015, there are 840, and the total number of Kokborok Christian members is more than 98,000 in Tripura [26, 27]. We, as the researcher of Tripura, has initiated a research work towards a less resourced language like Kokborok and integrate natural languages processing techniques and Music Information retrieval for this kind of underresourced language. In the next section, we describe the related work in MIR field, the third section is about proposed work mood taxonomy, part four mentioned about the feature selection for audio and lyrics based classification, section five describes classification results and evaluation, and comparison and conclusion and future works is described in section 6. 2. Related Works 2.1 Data set and Taxonomy Mood taxonomy is the set of adjectives by which any dataset can be represented in the best way. There are several taxonomies available, i.e. Russell's taxonomy (figure.1), MIREX, Havner's taxonomy (figure.2) [4, 18]. For Indian song, Havner's and Russell's taxonomy is found to be better fitted. Preparing a large data set of songs with similar lyrics and audio files is an essential work for the mood classification task. Mood annotated dataset is required to find the mood attached with every song considering both lyrics and audio. Considering Indian music information retrieval task, very less amount of work is done like on Hindi [2,3,7-13]. In [20], find the electronic user interface where music mood tagging is automatically done based on lyrics only. Figure. 1. Russell Taxonomy 507
2.2 Mood Classification using Audio Feature MIREX [ 4 ] is mood taxonomy, and it is a yearly evaluation assignment of music information retrieval systems and algorithm where the valence arousal and score are calculated for music using several regression Models [5, 13, 16]. In these paper, [7-11] have used Russell mood taxonomy for audio based classification framework for Hindi music and shows that spectral features timbre features are promising features of audio. Also, there are other significant features is rhythm, pitch, intensity. 2.3 Mood Classification from Lyric Features Several classification tasks were conceded on western music mood classification based on the bag of words, sentiment lexicons (sentiwordnet) and stylistic features of a text [5]. In Hindi music [3], used three types of sentiment lexicons, stylistic features and n-gram features combined for lyrics based classification task. In our work, only text stylistic features are used as a feature set for classification purpose because till now no senti word net is available for Kokborok. So in future, we have to build it manually and used as a feature set for the classification task. 2.4 Multi-modal Music Mood Classification Figure.2 Havner s Taxonomy Some researchers used audio and lyrics features combined to get the automatic multimodal model for mood classification of music for western music as far as concerned to Indian music, multimodal classification done on few languages only [7-11]. 3. Proposed Work 3.1.1 Database creation of Christian Kokborok music Our mood classification task is for one of the regional language Kokborok and our dataset confined only Christian Kokborok music. We gathered 300 audio songs with their corresponding lyrics which 508
are from Kokborok Christian community and related to the Holy Bible. Songs are used in this experiment are of 30-sec clip because the survey observes that first 30-sec clip of any song has the most useful information. So for the computational purpose, we remove all the noise from audio files. 3.1.2 Mood Annotated Dataset For the mood classification task, it is necessary to create a ground truth set of data for audio as well as lyrics. As for our knowledge, there is no mood annotated dataset for, so we have to annotate the files manually. Two annotators who know Kokborok does our annotation. Annotation is done in coarse grain method for lyrics data and annotation is done by reading the lyrics only. For audio data, the annotation is based solely on the music not considered lyrics. 3.2 Taxonomy generation Mood taxonomy is used to express the feeling and the emotion regarding the song very firmly attached to it. As for our knowledge, no experiment is proposed to generate any taxonomy for Kokborok Christian song. So we adopted the subset from Havner s adjectives list. We observed in the initial observation that the adjectives in the Havner s list have fallen under the category where they can fit the database in the best way. Because songs having similar class have to be close to each other, and songs having different cluster have to be distinct from each other in the hyperplane of v-a. Table 1. Proposed mood taxonomy Class Happy Sad Calm Excited Sub-Class Cheerful Mournful Sacred Excited Merry Tragic Solemn Dramatic Joyous Pathetic Inspiring Aroused 4. Feature Selection 4.1 Features selection for Audio classification Feature extraction and selection in mood classification is an essential task for building a system by the literature survey [1,10,11], All the features are taken out by jaudio toolkit [19]. It is available publicly for research purpose and used by many researchers [2, 5, 6, 7, 13]. Timbre: The distinctive features of timbre have implemented for several researchers for music analysis. It is observed that MFCC features of Timber have been active features for music mood as well as a genre classification task. The spectral flux, spectral centroid, spectral shape, variability characteristics are essential for differentiation moods [2, 7, 11]. Intensity: it is an essential feature in mood detection. We consider the overall average root means square and fraction of low energy which is also used by [2, 3] for calculating the values of each feature. 509
Rhythm: Rhythm strength, Rhythm regularity, and tempo are related to people s mood response. From the literature review, it has been seen that rhythm is steady and balanced for happy music, sad usually slow and does not have a distinctive rhythm pattern [7, 11]. 4.2 Future selection for lyrical classification Text stylistic features are used effectively for classification of mood from lyrics of Western music [6] Some of the TS features, i.e. the total number of unique words, repeated words etc. used by [2, 8, 11] for Hindi music. We considered some of the TS features in our experiments are shown above in Table 3. 5. Classification result and evaluation For classification support, vector machine classifier (SVM) is used. Support vector machine and decision tree classifier are widely used for music classification purpose. Many researchers use irrespective of language these classifiers with high accuracy rate [2, 8-11] for audio based mood classification. Table 2. Features Used For Audio Classification Table 3. Features Used For Lyrics Classification Feature Class Features used Feature Description Timbre Intensity Rhythm Spectral Off, Spectral Variability, Macc's, Lpc's, Roll Partial Based Spectral Centroid, Root Means Square, Beat Histogram, Strongest Beat, Beat Sum, The strength of Strongest Beat, Zero Crossing, Feature Name Number (No.) words No. unique words No. repeated words No. of line of of of No. of the repeated line Feature Description Total no. of words in a lyric Total no. of unique words in a lyric Total no. of words in a lyric whose frequency is greater than 1 Total no. of line in a lyric Total no. of repeated line in a lyric No. unique line of Total no. of unique line in a lyric Weka is an open source machine learning tool that can use for classification task [1, 5, 6]. We incorporate SVM for mood classification. We tried various other algorithms also but do not give adequate results, so we choose LibSVM to be performed. In SVM, polynomial and radial basis function does not perform well. So, we showed with linear SVM and developed three particular systems. We faced lots of difficulties while annotating the song because previous surveys or any resource are not available for this language and also there are mood changes while annotating the audio and reading the lyrics. And as it is mentioned that the songs belong to the Holy Bible, many songs have similar kind of emotions and confusion creates between classes like calm or happy and subclasses between sacred or sad. 510
That is why we have sorted our data set up to 300 songs selectively in the case of western and Hindi music classification task based on lyrics observed a maximum of 80-90% and 50-75% [2, 7-11]. To the best of our knowledge, still, there is no work has been carried out on Kokborok song classification. So, we present a baseline system for a mood classification system for audio and lyrics and multimodal data. For lyrics classification, Lack of sentiment lexicons leads to comparatively less accuracy rate. One of the reasons may be that in the Bible, there is decidedly fewer variations are there (the perspective of instruments and singer), and majorities are devotional songs dedicated to Lord Jesus Christ. It is observed that the mood over the whole song may be different by annotators prospective. We initially classify audio by 48 parameters, but it does not work well for Kokborok music. Some of the parameters do not create any impact on classification result, we get 49% accuracy rate by LibSVM. So we select only those parameters which are significant changes in each class. We observed that only MFcc s s, Spectral Centroid, Strongest Beat, Beat Sum, Peak Based Spectral Smoothness, Zero Crossing are significant changes as classification result gets affected by those parameters only. 5.1 Classification System Evaluation in Weka 3.5 It is necessary to have an enormous amount of mood annotated data for implement on a statistical model for the good results. Since this work is initially started, so the number of songs is less compared to western and other Indian languages. The mood classification has been performed using LibSVM classifier according to the features we have set. We used WEKA API 3.8.1 for building our classification model. In table 4(b), we can see the actual values and the predicted values for the audio classification system. The bold diagonal elements in each column represent the correctly predicted values. So the accuracy of the system is calculated by 286 (111+90+48+37)/ total number of the song (300) *100 = 95%. Similarly, table 5(b) and Table 6(b) show the confusion matrix of lyrics based classification and multimodal classification respectively. Table 4(a), 5(a), 6(a) shows the precision-recall and F-measure of audio, lyrics and multimodal classification system. Table 4(a). Classification performance in weka Class Precision Recall F-measure Table 4(b). Confusion matrix for the audio- Predicted values Class Calm Excited Happy Sad Calm 0.97 0.98 0.97 Excited 0.96 0.94 0.95 Happy 0.90 0.92 0.91 Sad 0.92 0.92 0.92 Average 0.95 0.92 0.95 Actual values Calm 111 2 0 0 Excited 3 90 2 0 Happy 0 1 48 3 Sad 0 0 4 37 Average accuracy rate 95% 5.2 Classification based on lyrics 511
Table 5(a). Classification system performance for lyrics Class Precision Recall F- measure Calm 0.98 0.99 0.97 Table 5(b). Confusion matrix for lyrics based system Predicted values Class Calm Excited Happy Sad Calm 112 1 0 0 Excited 0.95 0.97 0.96 Happy 0.97 0.90 0.94 Sad 0.95 0.97 0.96 Average 0.97 0.97 0.97 Actual values Excited 2 93 0 0 Happy 0 3 47 2 Sad 0 0 1 39 Average accuracy rate 97% Table 6(a). Multimodal system performance Class Precision Recall Calm 0.99 0.98 0.98 F- measure Table 6(b). Confusion matrix for multimodal system Predicted values Class Calm Excited Happy Sad Calm 111 2 0 0 Excited 0.95 0.96 0.96 Happy 0.90 0.92 0.91 Sad 0.94 0.92 0.93 Average 0.96 0.96 0.96 Actual values Excited 1 92 2 0 Happy 0 2 48 2 Sad 0 0 3 37 Average accuracy rate 96% 5.3 Comparison of different algorithms and system performance We used a different classifier to performed classification on the dataset for each of the three systems. From the figure.3(b), we can say that support vector machine with a linear kernel and decision tree classifier that is j48 algorithm gives averagely similar and better results compared to other algorithms. Figure. 3(a) Graphical representations of System performance with different algorithms 512
System Algorithms LibSVM J48 Naïve Bayes LibSVM Polynomial Kernel Audio 95 95 86.3 89 90 Lyrics 97 96 81.6 96 86 Table 3(b). System performance with different algorithms SMO Multimodal 96 97 75 88.6 84.3 6. Conclusions In the recent work, the multimodal mood annotated database is developed for the research in music mood classification of Kokborok. Three classification system is designed by multimodal dataset. Audio based system gives accuracy rate of 0.95% and lyrics based classification system gives 0.97% accuracy rate and for multimodal we achieved maximum F measure of 0.97 by LibSVM (linear kernel). We observed the mood variants during the annotation of the songs separately for audio and lyrics dataset. There are decidedly fewer variations in Kokborok Christian song is found, and some of the reason may be that the same usage of instruments and the unavailability of many Kokborok singers. As audio features of a given song based on the instrumental variations too, so classification accuracy rate gets also affected. We show the comparison of system performances between three different systems and finally for the multimodal system and even using different classifier for each of the systems we can say that the LibSVM and j48 classifier both performed better on the above Christian Kokborok dataset. 7. Future Work We primarily considered music mood classification applications in future for Kokborok Christian music. We will work lyrics classification for various lyrical features, i.e. n-gram, bow, and sentiment lexicons. As for our knowledge, there is no sentiment lexicon available for Kokborok, so we will develop a sentiment word dictionary for Kokborok and explore all other possible features for lyrics based classification. In the multimodal system, we will study a more in-depth analysis of readers and listener's point of view. References [1]. Tian, Y., Wu, Q., & Yue, P. (2018). A comparison study of classification algorithms on the dataset using WEKA tool. Journal of Engineering Technology, 6(2), 329-341. [2]. Patra, B. G., Das, D., & Bandyopadhyay, S. Mood classification of Hindi songs based on lyrics In Proceedings of the 12th International Conference on Natural Language Processing, pp. (261-267),2015. 513
[3]. Patra, B. G., Das, D., & Bandyopadhyay, S. Retrieving Similar Lyrics for Music Recommendation System In 14th International Conference on Natural Language Processing, PP. (48-52), ICON, December- 2017. [4]. Downie, X. H. J. S., Cyril Laurier, and M. B. A. F. Ehmann. The 2007 MIREX audio mood classification task: Lessons learned In Proc. 9th Int. Conf. Music Inf. Retrieval, pp. (462-467), 2008. [5]. Joshi, A. Balamurali, R. and Bhattacharyya P. A fall-back strategy for sentiment analysis in Hindi: a case study In Proc. Of the 8th International Conference on Natural Language Processing, (ICON -2010). [6]. Aniruddha M. Ujlambkar and Vahida Z. Attar. 2012. Mood classification of Indian popular music In Proc. of the CUBE International Information Technology Conference, pp-(278-283), ACM. [7]. Patra, B. G., Das, D., & Bandyopadhyay, S. Automatic music mood classification of Hindi songs In Proc. of 3rd Workshop on Sentiment Analysis where AI meets Psychology PP. (24-28), IJCNLP 2013a. [8]. Patra, B. G., Das, D., & Bandyopadhyay, S. Multimodal mood classification framework for Hindi songs in Computacin y Sistemas, vol-20(3), PP.(515-526), 2016. [9]. Patra, B. G., Das, D., & Bandyopadhyay, S. Unsupervised approach to Hindi music mood classification In Mining Intelligence and Knowledge Exploration, pp. (62-69), Springer International Publishing, 2013b. [10]. Patra, B. G., Das, D., & Bandyopadhyay, S. Multimodal mood Classification-a case study of differences in Hindi and western songs" In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp.(1980-1989), 2016. [11]. Patra, B. G., Das, D., & Bandyopadhyay, S. Labeling data and developing a supervised framework for Hindi music mood analysis" in Journal of Intelligent Information Systems, vol- 48(3), p-(633-651), 2017. [12]. Laurier, Cyril. Mohamed Sordo, Joan Serra and Perfecto Herrera. Music mood representations from social tags In Proc. of the ISMIR, pp. (381-386), 2009. [13]. Patra, B. G., Das, D., Maitra, P & Bandyopadhyay, S. Feed- Forward Neural Network based Music Emotion Recognition In MediaEval Workshop, September 14-15, 2015. [14]. Banerjee, S. A Survey of Prospects and Problems in Hindustani Classical Raga Identification Using Machine Learning Techniques In Proceedings of the First International Conference on Intelligent Computing and Communication, pp.(467-475), Springer, Singapore, 2017. [15]. Makarand R. Velankar and Hari V. Sahasrabuddhe. A pilot study of Hindustani music sentiments In Proc. of 2nd Workshop on Sentiment Analysis where AI meets Psychology India, pages- (91-98), IIT Bombay, Mumbai, COLING- 2012. [16]. Malheiro, R., Panda, R., Gomes, P., & Paiva, R. P. Emotionally-relevant features for classification and regression of music lyrics in IEEE Transactions on Affective Computing, vol- (2), p-(240-254), 2018. [17]. Yang, D., & Lee, W. S. Music emotion identification from lyrics In Multimedia, ISM 09. 11th IEEE International Symposium, pp -(624-629), IEEE (2009, December). [18]. James A. Russell. A Circumplex Model of Affect" In Journal of Personality and Social Psychology, vol- 39(6), p-(1161-1178), 1980. 514
[19]. McKay, C., Fujinaga, I., & Depalle, P. (2005). jaudio: A feature extraction library. In Proceedings International Society for Music Information Retrieval (ISMIR) (pp. 600 603). [20]. E. and M. Morisio," Moody lyrics: A sentiment annotated lyrics dataset, In Proceedings of the 2017 International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence, ISMSI, pp- (118124), ACM, Hong Kong (March 2017). [21]. Wasim, M., Chaudary, M. H., & Iqbal, M. (2018). Towards an Internet of Things (IoT) based Big Data Analytics in Journal of Engineering Technology, 6(2), 70-82. [22].Degaonkar, V. N., & Kulkarni, A. V. (2018). Automatic raga identification in Indian classical music using the Convolution Neural Network in Journal of Engineering Technology, 6(2), 564-576. [23]. Kumar, K. R., Santosh, D. T., Vardhan, B. V., & Chiranjeevi, P. "Machine learning in the computational treatment of opinions towards better product recommendations an ontology mining way: a survey" in Journal of Engineering Technology, 6(2), 587-594. [24]. Al-Barhamtoshy, H. M., & Abdou, S. (2018). Arabic OCR Metricsbased Evaluation Model in Journal of Engineering Technology, 6(1), 479-495. [25]. Collection of Some Kokborok songs Available: https://tripuraking.com/site_0.xhtml. [26]. Detail about the Kokborok language. Available: https://en.wikipedia.org/wiki/kokborok. [27]. Detail about Christianity religion in Tripura state, Available: https://en.wikipedia.org/wiki/christianityintripura. 515