Combining Musical and Cultural Features for Intelligent Style Detection

Size: px

Start display at page:

Download "Combining Musical and Cultural Features for Intelligent Style Detection"

Isabel Harris
5 years ago
Views:

1 O Combining Musical and Cultural Features for Intelligent Style Detection Brian Whitman MIT Media Lab Cambridge MA U.S.A ABSTRACT O.BV:<=PWC8N?O"; DJL*D X BC?D.TYWC8Y?=PLNM>QZ:"BC8 DWM?OQ[BV??=ZU+O.B\:<=@WC8]WC^KBVM+T=P:<WVFS BC8+T*:<D_`:<M+BVQ T>BV:"Ba%b(:4SQPDc=[TDJ8(:<=ZU+O.B\:<=@WC8]=P?dB:"BC?efG1; =PO3;]WV^g:<DJ8h=P8`i'WQPi'D?OM QZj :<MF"BCQIBV?AEDJO:<?k8 WV:A>F<DJ?DJ8(:WCFD.BV?=PQPSlD_`:F"BCO:<DJT-:<;>F<WCM m;nbcm+t=zj :<WVFSlAF<W(OJDJ??=P8 mapoi;>df?o"; DL*DNGID*AF<WAEW?D*OWL*Ä QPDJL*D8`:<?kBV8(S BVM+T>=PWqTF<=Pi'DJ8*mCDJ8>F<DWCFI?R:4SQPD%TD:<DJO:<=PW8r?RS?R:<DJLsG1=Z:<; BkOJQ@BC??=ZÜ DF X BC?D.TrW8*GID X j/d_`:f"bco:<djtft>bv:"bcgidto.bcqpqu OJWL*LqM 8 =Z:9SNL*D:"BCT B\:"BaPv oi;>dtbct T=P:<=PWC8WC^E:<;>DJ?DOJM QZ:<MF"BCQ>BV::F<= X M:<DJ?K=@8NWM>FK^gD.BV:<MF<D?Ä BCOJD BV=[T?c=P8-AF<WAEDFkOJQ@BV??=PÜ O.BV:<=PWC8nWC^IBCOJWCM?R:<=PO.BCQPQZSlT>=P??=PL*=@Q@B\FkLNM?=PO G1=Z:<; =P8f:<; D?<BCL*D%?R:9SQ@DCẅ BV8+T*?=PL*=PQ[B\F1LNM>?=@O X DJQPW8>m=P8 m:<wt=pxyd3fj DJ8(:1?R:9S>QPDJ?Ja 1. INTRODUCTION Musical genres aid in the listening-and-retrieval (L&R) process by allowing a user or consumer a sense of reference. By organizing physical shelves in record stores by genres, shoppers can browse and discover new music by walking down an aisle. But the digitization of musical culture carries an embarrassing problem of how to organize collections: folders full of music recordings, peer-to-peer virtual terabyte lockers and handheld devices all need the same attention to organization as rooftop music stores. As a result, recent work has approached the problem of automatic genre recognition [8] [2], creating top-level clusters of similar music (rock, pop, classical, etc.) from the acoustic content. While the high level separation of genres is useful, we tend to look more toward styles for discovering new music or for accurate recommendation. Styles usually define subclasses of genres (in the genre Country we can choose from No Depression, Contemporary Country, or Urban Cowboy ), but sometimes join together artists across genres. Stores (real or virtual) normally do not partition their space by style to avoid consumer confusion ( z {r}y~.c 6z V z V dˆ ƒ3 <Š 9Œ ) but they can provide crossreference data (as in the case of the All Music Guide ( 'Ž'Ž R V ' ' ' J > ' š š J ); and recommendation engines can utilize styles for high-confidence results. Style is an imperative class of description for most music retrieval tasks, but is usually considered a human concept and can be hard to model. Some styles evolved with no acoustic underpinnings: a favorite is intelligent-dance-music or IDM, in which the included artists range from the abstract sine-wave noise of Pan Sonic to the calm filtered melodies of Boards of Canada. At first glance, IDM would be an intractable set to model due to its similarity being almost purely cultural. As such, we usually rely on marketing, print publications, recommendations of friends ( gœ ƒ3 ž ƒ] z[ ƒ*ÿc V z ƒ ƒ ) to understand styles on our own. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notices and the full citation on the first page IRCAM - Centre Pompidou Paris Smaragdis MIT Media Lab Cambridge MA U.S.A paris@media.mit.edu In this paper we present an automatic style detection system that operates on both the acoustic content of the audio and the very powerful cultural representation of community metadata, using descriptive textual features extracted from automated crawls of the web. The community metadata feature space has previously shown to be effective in a music similarity task on its own [10], and here we augment it with an audio representation. This combined model performs extremely well in identifying a set of previously edited style clusters, and can be used to cluster arbitrarily large new sets of artists. 2. PRIOR WORK 2.1 Genre Classification Automatic genre classification techniques that explicitly compute clusters from the score or audio level have reported high results in musically or acoustically separable genres such as classical vs. rock, but the hierarchical structure of popular music lends itself to a more finegrained set of divisions. Using the score level only, (MIDI files, transcribed music or CSound scores) systems can extract style or genre using easily-extractable features (once the music is in a common format, which may require character recognition on a score, or parsing a MIDI file) such as key and frequently used progressions. Systems normally perform genre classification by clustering similar music segments, or performing a one-in-n (where n is the number of genres) classification using some machine learning technique. In [5], various machine learning classifiers are trained on performance characteristics of the score to learn a piece-global style, and in [2] three types of folk music were separated using a Hidden Markov Model. Approaches that perform genre classification in the audio domain use a combination of spectral features and musically-informed inferred features. Genre identification work undertaken in [8] aims to understand acoustic content enough to classify into a small set of related clusters by studying the spectra along with tempo-sensitive timbregrams with a simple beat detector in place. Similar work treating artists as complete genres (where similar clusters of artist form a genre ) is studied in [9] and then improved on in [1] with more musical knowledge. 2.2 Cultural Feature Extraction Cultural features concerning music are not as well-defined and vary with time and interpretation. Any representation that aims to express music as community description is a form of cultural features. The most popular form of cultural features, lists of purchased music, are used in collaborative filtering to recommend music based on their peers tastes. Cultural features are important to express information about music that cannot be captured by the actual audio content. Many music retrieval tasks cannot do well on audio alone. A more automatic and autonomous way of collecting cultural features is described in [10]. There, we define community metadata (which is used in this paper) as a vector space of descriptive textual

2 A terms crawled from the web. For example, an artist is represented as their community description from album reviews and fan-created pages. An important asset of community metadata is its attention to time: in our implementation, community metadata vectors are crawled repeatedly and future retrievals take the time of description into account. In a domain where long-scale time is vastly important, this representation allows recommendations and classifications to take the buzz factor into account. Cultural features for music retrieval are also explored in [4], where web crawls for my favorite artists lists are collated and used in a recommendation agent. The specifics of the community metadata feature vector are described in greater detail below. 3. STYLE CLASSIFICATION To test our feature space and hypotheses concerning automatic style detection, we chose a small set of artists spanning five separate styles as classified by music editors. In turn, we first make classifications based solely on an audio representation, then a community metadata representation, and lastly show that the combined feature spaces perform the best in separating the styles. 3.1 Data Set For the results in this paper we operate on a fixed data set of artists chosen from the Minnowmatch music testbed (related work analyzes this database in [9], [1], [10].) The list used contained twenty-five artists, encapsulating five artists each across five music styles. The list is shown in Table 1. Each artist in represented in the Minnowmatch testbed with one or two albums worth of audio content. The selection of artists in the testbed was defined by the output of a peer-to-peer network robot which computed popularity of songs by watching thousands of users collections. We have previously crawled for the community metadata for each artist in the Minnowmatch tested in January of The ground truth style classification was taken from the All Music Guide at (Ž'Ž K R ' V ' ' ' > ' 'š š J (AMG), a popular edited web resource for music information. We consider AMG our bestcase ground truth due to its collective edited nature. Although AMG s decisions are subjective, our intent is to show that a computational metric involving both acoustic and cultural features can approximate an actual labeling from a professional. The size of our data is intentionally small so as to demonstrate the issues of acoustic versus cultural similarity presented in this paper. This simulation is not meant to represent a fully functioning system due to its scope, but the approach and results propose a viable solution to the problem. 4. AUDIO-BASED STYLE CLASSIFICATION One obvious feature space for a music style classifier is the audio domain. While we will show that it is not always the best way to discern cultural labels such as styles, we can say it is a very good indicator of the sound of music and perhaps as a higher-level genre classifier. The audio-based style classifier operates by forming each song into a representation and training a neural network to classify a new song from a test set into one of the five classes. Below, we describe the representation used and the training process. 4.1 Representation We chose a fairly simple representation for this experiment. For each artist in our set, we chose on average 12 songs randomly from! " # $ %'& ( )$ (+*-,./, (0( :9 ; < =0 >, )(0(5 BDCFEFEHGJI 0$ ), * (K,2*LM0 N O F, * ) G?(F ( 4$ P QR ;??- 5 B S O T?(U0 N ; )V G,? (T S I ; ) )$ W " 40.X 4Y ;(5 their collection. The audio tracks were downsampled to 11,025Hz, converted to mono, and transformed to zero mean and unit variance. We subsequently extracted the 512-point power spectral density (PSD) of every three seconds of audio and performed dimensionality reduction using principal components analysis (PCA) to the entire training data set to reduce it down to twenty dimensions. The process is described in Figure 1. The series of the reduced PSD features as extracted from all the available audio tracks were used as the representation of every artist. 4.2 Classification and Learning Learning for classification on the audio features was done using a feedforward time-delay neural network (TDNN) [3]. This is a structure that allows the incorporation of a short time memory for classification, by providing as inputs samples of previous time points (Figure 2). For training this network we used the resilient backpropagation algorithm [6] and iterated in batch mode (using the entire training set as one batch). The inputs layer has twenty nodes (one for each dimension of the representation) with a memory of three adjacent input frames. We used one hidden layer with forty nodes, and the output layer was five nodes, each corresponding to one of the five styles we wished to recognize. The training targets were a value of 1 for the node corresponding to the style of the input, and values of 0 for all the other nodes. In the testing phase, the features of the test set were extracted (using the same dimensionality reduction transform derived from the training data), and they were fed to the classification network. Styles were assigned to the output node corresponding to the maximum value.

3 G L L L B &, N AU 0?(U( ), )0 ( " T(+*-,./, (0( F X-* )R, DV H3 * V * T ;$ K C M 8>?8 v twc?dj? H=PQ@QZS%BJSS`F<M? HW'B\F"T>?1WC^IBV8+BT>B BCMFS>8%=PQPQ Q@BC8!BCO"e?WC8 79OJDHM X D Ä ;>D_foHG1=P8 BCQP=ZS`BC; bè =@TW.G oi=@l"yo F"B.G #lm>j9o BV8 m$hq[bv8 b %(M+B\F<DJÄ M>?; DF %D X DJQ@BC;&YWVF<m'BV8 DJT('EDJÄ AEDJQP=P8 BVF:<;!F<W(WCè? S?R:<=PO.BVQ )QPW8>D o W8 =*F"B\_:<WC8 HQ@BCO"eYb>B X>X B\:<; +cdj8 8(SH; D?8 DS,%M:<eBV?R: YWM?DcWC8&pBVF<? S(B 4.3 Results We ran a training and testing scheme where each row (collection of five artists across five styles) in turn was selected for testing. The remaining four rows were used for training. This process was executed five times (one for each row as a test,) and the results for each permutation are shown in Figure 3. As is clearly evident, the results are not particularly good for the IDM style. Most of the artists have been misclassified, and there is little cohesion among that style. This should not be construed as a shortcoming in the training method, as this is a music style that exhibits a huge auditory variance, ranging from aggressive rough beats to abstract and smooth textures. What ties these artists together as a style is not a common sound of their work, but rather a cultural affinity stemming from the use of electronic instruments, and common roots ranging back to electronic dance music. Likewise, we see inconsistent results for Lauryn Hill, classified as a rap artist due to her rap-like production. B 4&O, 7 9VR HO,2 $TS ) X R ) U S(P < ; A$ A I 0 (0(/ R 4$ I ) V )$ (0 ; )(5 $TS B U 8 W8`ì =PWQPDJ8(: AEDF<e(S?RGID.T=@?; =@8(:<DF<8 BV:<=PW8 BCQ =@8>8 DF OJW8>?=@?R:<D8`: X =Z::<DF OJQ@BC??=ZU+D.T [RM 8>=PWCF A>F<Ẁ T>M>OJD.T F<WLrBC8(:<=PO 9-Y V`a V`a V`a VW V`a V GDX V`a V GDY V`a V G V V`a V V`a V LOZ V`a VUVD\ LDL V`a VUVD] V`a VUVD] Such intra-style auditory inconsistencies are of course hard to overcome using any audio based system, highlighting the need for additional descriptors that factor in additional cultural issues. 5. COMMUNITY METADATA-BASED STYLE CLASSIFICATION We next describe using cultural features for style classification solely using the community metadata feature vectors described earlier. The cultural features for the 25 artists in our set were computed during work done on artist similarity early in Each artist is associated with a set of roughly 10,000 unigram terms, 10,000 bigram terms, 5,000 noun phrases and 100 adjectives. Each term was associated with an artist by it appearing on the same web document as the artists name but this alone does not prove a causal relation of description. Associated with each term is a score computed by the software (see Table 2) that considers position from the named referent and a gaussian window around the term frequency of the term divided by its document frequency. (Term frequency is how often a term appears relating to an artist and document frequency is how often the term appears overall.) The gaussian we used is: -.0/ :3<;>=@?A1CBD?FE G:HJI Here, -OP is the document frequency of a term, -. the term frequency of a term, and Q and H are parameters indicating the mean and deviation of the gaussian window. This method proved well in computing artist similarities (given a known artist similarity list, this metric could predict them adequately) but here we ask the same data to arrange the artists into clusters. KMLON ^- 9- _ S?,2 ; +* H4 ; =P ` /, 0, (0 _ H?, ; +* 5 7baL4 0?(0(S4 =4 ; )$ Y 4?(5dc ; ( *-, =?(,?( )$ H4 "R ) M - 0 G ( 0 X- +*L(* (e (0, * $,P5 5.1 Clustering Overlap Scores The community metadata system computes similarity by a simple overlap score. Each pair of artists is similar with unnormalized Cƒ PŠ3 fhƒ z<g œ` ih where h is a additive combination of every shared term s score. These scalars are unimportant on their own, but we can rank their values using each artist in our set as the ground artist to see which artists are more similar each other. Using this method, we compute the similarity matrix M(25,25), using each artist in the five-style set. (See Figure 4.) This matrix is then used to predict the style of each given artist. For each term type, we take each artist in turn and sort their overlap weight similarities to the other 24 artists in descending order. We then use prior knowledge of the actual styles of the 24 similar artists

4 Guns n Roses Billy Ray Cyrus DMX Boards of Canada Lauryn Hill 64 % 33 % 1 % 1 % 1 % AC/DC 21 % 53 % 19 % 1 % 7 % Alan Jackson 6 % 9 % 65 % 4 % 17 % Ice Cube 9 % 11 % 32 % 12 % 37 % Aphex Twin 23 % 4 % 35 % 8 % 30 % Aaliyah 54 % 9 % 9 % 8 % 21 % Skid Row 19 % 52 % 4 % 4 % 21 % Tim McGraw 0 % 1 % 73 % 10 % 15 % Wu Tang lan C 0 % 0 % 40 % 31 % 29 % Squarepusher 6 % 3 % 31 % 14 % 46 % Debelah Morgan 76 % 10 % 1 % 2 % 11 % Led Zeppelin 39 % 56 % 3 % 0 % 2 % Garth Brooks 13 % 21 % 31 % 16 % 19 % Mystikal 15 % 4 % 26 % 27 % 28 % Plone 3 % 1 % 14 % 18 % 64 % Toni Braxton 72 % 18 % 2 % 5 % 2 % Black Sabbath 25 % 60 % 8 % 6 % 1 % Kenny Chesney 0 % 0 % 51 % 20 % 35 % Outkast 17 % 7 % 27 % 16 % 33 % Mouse on Mars 17 % 11 % 20 % 10 % 42 % Mya 41 % 35 % 7 % 5 % 12 % 25 % 62 % 3 % 3 % 7 % 13 % 2 % 62 % 7 % 16 % 10 % 8 % 32 % 24 % 27 % 7 % 1 % 31 % 10 % 51 % - ; DAM $ 4%'& ( )$L(+*,2 H, (0(0 21O0 265eD ;(K$ ).0- )(0 H )$ & &O?, * 0-K 0 (K& ),? #N 4 0?/,2 (+*,2 V5 to find the true style of our target artist: descending the sorted list, once we have counted four other artists in the same cluster, we consider our target artist classified with a normalized score (the amount of cumulated overlap weights the cluster contributed to the total cumulated overlap weights.) The highest cumulated score is deemed the correct classification, and the five style scores are arranged in a probability map. In a larger-scale implementation, this step is akin to using a supervised clustering mechanism which tries to find a fit of an unknown type among already labeled data (by the same algorithm). Because of the small size of the sample set, we found this more manual method more effective. We do this for each term type in the community metadata feature space and average the returned maps into a generalized probability map. The map defines a confidence value for each style much like the neural network s results above, and the probability approach was crucial in integrating the two methods (which we describe below.) 5.2 Results In Figure 5 we see that the results for the text-only classifier performs very well for three of the styles and adequately but not perfectly for two of the five styles. There seems to be confusion between the Rap and R&B style sets. However, for the previous problem set (IDM), the cultural classifier works perfectly and with high confidence. We can attribute this to IDM being an almost purely culturally-defined style. One of the issues that plague acoustically-derived classifiers is that often human classifications have little statistical correlation to the actual content being described. This problem also interferes with content-based recommendation agents that attempt to learn a relation model between user preference and audio content: sometimes, the sound of the music has very little to do with how we perceive and attach preference to it. R&B and Rap s intrinsic crossover (they both appear on the same radio markets and are usually geared toward the same audiences) shows that the cultural classifier can be as confused as humans in the same situation. Here, we present the inverse of the description for content problem: just as often, cultural influences steer us away from treating two almost identical artists as similar entities, or putting them in the same class. We propose that automated systems that attempt to model listening behavior or provide commodity intelligence to music collections be mindful of both types of influences. Since we can ideally model both behaviors, it perhaps makes the most sense to combine them in some manner. 6. COMBINED CLASSIFICATION As pointed out in the preceding sections, some features which are crucial for style identification are best exploited in the auditory domain and some are best used in the cultural domain. So far, given our choice of domain, we have produced coherent clusters. Musical style (and even more so musical similarity) requires a complicated definition that can factor in multiple observations ranging from auditory, historical, geographical, ideological, etc. The community metadata is an effort to make up for the latter features, whereas the auditory domain helps on a more staunch judgment on the sound itself. It seems only natural that a combination of these two classifiers can help disambiguate some of the classification problems that we have discussed. In order to combine the two results we view our classifier data as posterior probabilities and compute their average values. This is a technique that has been shown to be good in practice, when we have a questionable estimate of posterior probabilities [7], as is the case in the cultural-based classification. 6.1 Results The results of the averaging are shown in Figure 6. It is clear that many of the problems that were present in the previous classification attempts are now resolved. The IDM class, which was problematic in the audio-based classification, is now correctly identified due to strong community metadata coherence. Likewise, the Rap cluster which was not well defined in the metadata classification, was correctly identified using the auditory influence. Overall the combined classification was correct for all samples, bypassing all the problems found in either audio or metadata only classification. 7. FUTURE WORK One less obvious use of this system is a cultural to musical ratio equation for relations among artists. An application that could know

5 Guns n Roses Billy Ray Cyrus DMX Boards of Canada Lauryn Hill 44 % 9 % 19 % 11 % 17 % AC/DC 5 % 80 % 5 % 4 % 5 % Alan Jackson 18 % 27 % 24 % 8 % 23 % Ice Cube 11 % 5 % 8 % 68 % 8 % Aphex Twin 21 % 13 % 23 % 11 % 33 % Aaliyah 30 % 13 % 16 % 23 % 18 % Skid Row 5 % 76 % 9 % 3 % 7 % Tim McGraw 22 % 18 % 28 % 15 % 17 % Wu Tang C lan 9 % 4 % 26 % 56 % 4 % Squarepusher 14 % 13 % 27 % 11 % 35 % Debelah Morgan 17 % 38 % 19 % 13 % 13 % Led Zeppelin 18 % 50 % 17 % 6 % 9 % Garth Brooks 10 % 7 % 28 % 37 % 18 % Mystikal 9 % 6 % 8 % 72 % 5 % Plone 11 % 10 % 20 % 11 % 47 % Toni Braxton 21 % 14 % 19 % 28 % 18 % Black Sabbath 11 % 60 % 10 % 8 % 11 % Kenny Chesney 17 % 30 % 16 % 9 % 28 % Outkast 10 % 6 % 10 % 65 % 9 % Mouse on Mars 17 % 16 % 26 % 11 % 30 % Mya 52 % 9 % 18 % 12 % 10 % 6 % 68 % 16 % 4 % 7 % 14 % 13 % 32 % 14 % 27 % 10 % 9 % 7 % 65 % 9 % 11 % 20 % 25 % 10 % 33 % - ;!a DR +* $ )%'&( 4$#(+*, F/, (0( D ;( $- 4- F F )(0 _ )$ &&?,? +* ;?( &3 ),? N 0 /, 4 ( *-, 5 Guns n Roses Billy Ray Cyrus DMX Boards of Canada Lauryn Hill 54 % 21 % 10 % 6 % 9 % AC/DC 13 % 66 % 12 % 3 % 6 % Alan Jackson 12 % 18 % 45 % 6 % 20 % Ice Cube 10 % 8 % 20 % 40 % 22 % Aphex Twin 22 % 8 % 29 % 9 % 31 % Aaliyah 42 % 11 % 12 % 15 % 20 % Skid Row 12 % 64 % 6 % 4 % 14 % Tim McGraw 11 % 10 % 51 % 13 % 16 % Wu Tang C lan 5 % 2 % 33 % 44 % 17 % Squarepusher 10 % 8 % 29 % 13 % 41 % Debelah Morgan 47 % 24 % 10 % 8 % 12 % Led Zeppelin 29 % 53 % 10 % 3 % 6 % Garth Brooks 12 % 14 % 29 % 27 % 19 % Mystikal 12 % 5 % 17 % 50 % 17 % Plone 7 % 5 % 17 % 14 % 56 % Toni Braxton 46 % 16 % 10 % 17 % 10 % Black Sabbath 18 % 60 % 9 % 7 % 6 % Kenny Chesney 9 % 15 % 33 % 14 % 31 % Outkast 13 % 7 % 18 % 40 % 21 % Mouse on Mars 17 % 13 % 23 % 11 % 36 % Mya 47 % 22 % 12 % 9 % 11 % 15 % 65 % 10 % 3 % 7 % 14 % 8 % 47 % 11 % 21 % 10 % 8 % 20 % 44 % 18 % 9 % 11 % 28 % 10 % 42 % - ; DR.&O )$ ( *-, #/, (0( ;( $ ) 0 )(0 _ )$ & &O?, * 0-0 ( &3 ), -? 4 0?/,2 (+*,2 V5

6 G K N 100 Style Classification Accuracy it. By combining both acoustic and cultural artist information we have achieved classification in styles that exhibit large variance in either domain. Accuracy (%) Audio Cultural Combined 0 Heavy Metal Country Rap IDM R&B - ; V K )(0, (Q ".,?, /, (0( ( -(5 AM// % Y*?( & ( )$==0 W/%?% an( *-,./,2 (0(0 21O 5 in advance how to understand varying types of artist relationships could benefit many music retrieval systems that attempt to inject commodity intelligence into the L&R process. A good case for such a technology would be a recommendation agent that operates on both acoustic and cultural data. Large scale record shops already compute cultural relationships using sale data fed into a collaborative filtering system, and music-based recommenders such as Moodlogic ( ' ' + ' ` 'š š J ) operate on spectral features. Both systems have proved successful for different types of music, and a system that could define ahead of time the proper set of features to use would be integral to a combination approach. We could simply define a culture ratio as K ON K K ON>N i.e. the probability that artists and will be similar using a cultural metric divided by the probability that artists and will be similar using an acoustic metric. A high culture ratio would alert a recommender that certain musical relationships (such as almost all in the IDM style) should be treated using a purely cultural feature space. Lower culture ratios would indicate that spectral or musically intelligent features should be used. 8. CONCLUSIONS We have presented a prominent problem in musical style classification, and proposed a multimodal classification scheme to overcome 9. ACKNOWLEDGMENTS This work was supported by the Digital Life Consortium of the MIT Media Lab. 10. REFERENCES [1] A. Berenzweig, D. Ellis, and S. Lawrence. Using voice segments to improve artist classification of music submitted. [2] W. Chai and B. Vercoe. Folk music classification using hidden markov models. In I "ƒ<ƒ z gc{frˆ" 9ƒ 3 ŠC z C Š V.ˆ ƒ ƒ "ƒ V z z Š " 9ƒ z<g ƒ "ƒ, [3] D. Clouse, C. Giles, B. Horne, and G. Cottrell. Time-delay neural networks: Representation and induction of finite state machines. In "!#!$!&% Š >{('+C ) ƒ* Š ) ƒ3 _f C V{,+.-0/(1, page 1065, [4] W. W. Cohen and W. Fan. Web-collaborative filtering: recommending music by crawling the web % 8*` 9ƒ ) ƒ3 _f V < V{, 33(1-6): , [5] R. B. Dannenberg, B. Thom, and D. Watson. A machine learning approach to musical style recognition. In " 9I "ƒ<ƒ z gc{ Rˆ gœ ƒ;: 5<5>=?" 9ƒ " ŠV z C %.* 9ƒ BA * {<z C V.ˆ ƒ ƒ "ƒ, pages International Computer Music Association., [6] M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In I ' 4ˆ gœ>ƒ "!$!#!D" ' C Jˆ 'EC ;) ƒ* ŠC ) ƒ3 _f V < V{, pages , San Francisco, CA, [7] D. Tax, M. van Breukelen, R. Duin, and J. Kittler. Combining multiple classifiers by averaging or by multiplying? In ŠC! 9ƒ 3 FE ƒ 9gC z z C -G<GH1. )c>'i5, pages , [8] G. Tzanetakis, G. Essl, and P. Cook. Automatic musical genre classification of audio signals, [9] B. Whitman, G. Flake, and S. Lawrence. Artist detection in music with minnowmatch. In R "ƒ"ƒ zg gc{1rˆ gœ>ƒkjml<ln: "!#!#! 2YC < V{ œ V &) ƒ* Š ) ƒ3 _f V < V{ĴC PÖ z<gc Š R "ƒ {"{<z ig, pages Falmouth, Massachusetts, September [10] B. Whitman and S. Lawrence. Inferring descriptions and similarity for music from community metadata submitted.

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)