Combining Musical and Cultural Features for Intelligent Style Detection

Size: px
Start display at page:

Download "Combining Musical and Cultural Features for Intelligent Style Detection"

Transcription

1 O Combining Musical and Cultural Features for Intelligent Style Detection Brian Whitman MIT Media Lab Cambridge MA U.S.A ABSTRACT O.BV:<=PWC8N?O"; DJL*D X BC?D.TYWC8Y?=PLNM>QZ:"BC8 DWM?OQ[BV??=ZU+O.B\:<=@WC8]WC^KBVM+T=P:<WVFS BC8+T*:<D_`:<M+BVQ T>BV:"Ba%b(:4SQPDc=[TDJ8(:<=ZU+O.B\:<=@WC8]=P?dB:"BC?efG1; =PO3;]WV^g:<DJ8h=P8`i'WQPi'D?OM QZj :<MF"BCQIBV?AEDJO:<?k8 WV:A>F<DJ?DJ8(:WCFD.BV?=PQPSlD_`:F"BCO:<DJT-:<;>F<WCM m;nbcm+t=zj :<WVFSlAF<W(OJDJ??=P8 mapoi;>df?o"; DL*DNGID*AF<WAEW?D*OWL*Ä QPDJL*D8`:<?kBV8(S BVM+T>=PWqTF<=Pi'DJ8*mCDJ8>F<DWCFI?R:4SQPD%TD:<DJO:<=PW8r?RS?R:<DJLsG1=Z:<; BkOJQ@BC??=ZÜ DF X BC?D.TrW8*GID X j/d_`:f"bco:<djtft>bv:"bcgidto.bcqpqu OJWL*LqM 8 =Z:9SNL*D:"BCT B\:"BaPv oi;>dtbct T=P:<=PWC8WC^E:<;>DJ?DOJM QZ:<MF"BCQ>BV::F<= X M:<DJ?K=@8NWM>FK^gD.BV:<MF<D?Ä BCOJD BV=[T?c=P8-AF<WAEDFkOJQ@BV??=PÜ O.BV:<=PWC8nWC^IBCOJWCM?R:<=PO.BCQPQZSlT>=P??=PL*=@Q@B\FkLNM?=PO G1=Z:<; =P8f:<; D?<BCL*D%?R:9SQ@DCẅ BV8+T*?=PL*=PQ[B\F1LNM>?=@O X DJQPW8>m=P8 m:<wt=pxyd3fj DJ8(:1?R:9S>QPDJ?Ja 1. INTRODUCTION Musical genres aid in the listening-and-retrieval (L&R) process by allowing a user or consumer a sense of reference. By organizing physical shelves in record stores by genres, shoppers can browse and discover new music by walking down an aisle. But the digitization of musical culture carries an embarrassing problem of how to organize collections: folders full of music recordings, peer-to-peer virtual terabyte lockers and handheld devices all need the same attention to organization as rooftop music stores. As a result, recent work has approached the problem of automatic genre recognition [8] [2], creating top-level clusters of similar music (rock, pop, classical, etc.) from the acoustic content. While the high level separation of genres is useful, we tend to look more toward styles for discovering new music or for accurate recommendation. Styles usually define subclasses of genres (in the genre Country we can choose from No Depression, Contemporary Country, or Urban Cowboy ), but sometimes join together artists across genres. Stores (real or virtual) normally do not partition their space by style to avoid consumer confusion ( z {r}y~.c 6z V z V dˆ ƒ3 <Š 9Œ ) but they can provide crossreference data (as in the case of the All Music Guide ( 'Ž'Ž R V ' ' ' J > ' š š J ); and recommendation engines can utilize styles for high-confidence results. Style is an imperative class of description for most music retrieval tasks, but is usually considered a human concept and can be hard to model. Some styles evolved with no acoustic underpinnings: a favorite is intelligent-dance-music or IDM, in which the included artists range from the abstract sine-wave noise of Pan Sonic to the calm filtered melodies of Boards of Canada. At first glance, IDM would be an intractable set to model due to its similarity being almost purely cultural. As such, we usually rely on marketing, print publications, recommendations of friends ( gœ ƒ3 ž ƒ] z[ ƒ*ÿc V z ƒ ƒ ) to understand styles on our own. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notices and the full citation on the first page IRCAM - Centre Pompidou Paris Smaragdis MIT Media Lab Cambridge MA U.S.A paris@media.mit.edu In this paper we present an automatic style detection system that operates on both the acoustic content of the audio and the very powerful cultural representation of community metadata, using descriptive textual features extracted from automated crawls of the web. The community metadata feature space has previously shown to be effective in a music similarity task on its own [10], and here we augment it with an audio representation. This combined model performs extremely well in identifying a set of previously edited style clusters, and can be used to cluster arbitrarily large new sets of artists. 2. PRIOR WORK 2.1 Genre Classification Automatic genre classification techniques that explicitly compute clusters from the score or audio level have reported high results in musically or acoustically separable genres such as classical vs. rock, but the hierarchical structure of popular music lends itself to a more finegrained set of divisions. Using the score level only, (MIDI files, transcribed music or CSound scores) systems can extract style or genre using easily-extractable features (once the music is in a common format, which may require character recognition on a score, or parsing a MIDI file) such as key and frequently used progressions. Systems normally perform genre classification by clustering similar music segments, or performing a one-in-n (where n is the number of genres) classification using some machine learning technique. In [5], various machine learning classifiers are trained on performance characteristics of the score to learn a piece-global style, and in [2] three types of folk music were separated using a Hidden Markov Model. Approaches that perform genre classification in the audio domain use a combination of spectral features and musically-informed inferred features. Genre identification work undertaken in [8] aims to understand acoustic content enough to classify into a small set of related clusters by studying the spectra along with tempo-sensitive timbregrams with a simple beat detector in place. Similar work treating artists as complete genres (where similar clusters of artist form a genre ) is studied in [9] and then improved on in [1] with more musical knowledge. 2.2 Cultural Feature Extraction Cultural features concerning music are not as well-defined and vary with time and interpretation. Any representation that aims to express music as community description is a form of cultural features. The most popular form of cultural features, lists of purchased music, are used in collaborative filtering to recommend music based on their peers tastes. Cultural features are important to express information about music that cannot be captured by the actual audio content. Many music retrieval tasks cannot do well on audio alone. A more automatic and autonomous way of collecting cultural features is described in [10]. There, we define community metadata (which is used in this paper) as a vector space of descriptive textual

2 A terms crawled from the web. For example, an artist is represented as their community description from album reviews and fan-created pages. An important asset of community metadata is its attention to time: in our implementation, community metadata vectors are crawled repeatedly and future retrievals take the time of description into account. In a domain where long-scale time is vastly important, this representation allows recommendations and classifications to take the buzz factor into account. Cultural features for music retrieval are also explored in [4], where web crawls for my favorite artists lists are collated and used in a recommendation agent. The specifics of the community metadata feature vector are described in greater detail below. 3. STYLE CLASSIFICATION To test our feature space and hypotheses concerning automatic style detection, we chose a small set of artists spanning five separate styles as classified by music editors. In turn, we first make classifications based solely on an audio representation, then a community metadata representation, and lastly show that the combined feature spaces perform the best in separating the styles. 3.1 Data Set For the results in this paper we operate on a fixed data set of artists chosen from the Minnowmatch music testbed (related work analyzes this database in [9], [1], [10].) The list used contained twenty-five artists, encapsulating five artists each across five music styles. The list is shown in Table 1. Each artist in represented in the Minnowmatch testbed with one or two albums worth of audio content. The selection of artists in the testbed was defined by the output of a peer-to-peer network robot which computed popularity of songs by watching thousands of users collections. We have previously crawled for the community metadata for each artist in the Minnowmatch tested in January of The ground truth style classification was taken from the All Music Guide at (Ž'Ž K R ' V ' ' ' > ' 'š š J (AMG), a popular edited web resource for music information. We consider AMG our bestcase ground truth due to its collective edited nature. Although AMG s decisions are subjective, our intent is to show that a computational metric involving both acoustic and cultural features can approximate an actual labeling from a professional. The size of our data is intentionally small so as to demonstrate the issues of acoustic versus cultural similarity presented in this paper. This simulation is not meant to represent a fully functioning system due to its scope, but the approach and results propose a viable solution to the problem. 4. AUDIO-BASED STYLE CLASSIFICATION One obvious feature space for a music style classifier is the audio domain. While we will show that it is not always the best way to discern cultural labels such as styles, we can say it is a very good indicator of the sound of music and perhaps as a higher-level genre classifier. The audio-based style classifier operates by forming each song into a representation and training a neural network to classify a new song from a test set into one of the five classes. Below, we describe the representation used and the training process. 4.1 Representation We chose a fairly simple representation for this experiment. For each artist in our set, we chose on average 12 songs randomly from! " # $ %'& ( )$ (+*-,./, (0( :9 ; < =0 >, )(0(5 BDCFEFEHGJI 0$ ), * (K,2*LM0 N O F, * ) G?(F ( 4$ P QR ;??- 5 B S O T?(U0 N ; )V G,? (T S I ; ) )$ W " 40.X 4Y ;(5 their collection. The audio tracks were downsampled to 11,025Hz, converted to mono, and transformed to zero mean and unit variance. We subsequently extracted the 512-point power spectral density (PSD) of every three seconds of audio and performed dimensionality reduction using principal components analysis (PCA) to the entire training data set to reduce it down to twenty dimensions. The process is described in Figure 1. The series of the reduced PSD features as extracted from all the available audio tracks were used as the representation of every artist. 4.2 Classification and Learning Learning for classification on the audio features was done using a feedforward time-delay neural network (TDNN) [3]. This is a structure that allows the incorporation of a short time memory for classification, by providing as inputs samples of previous time points (Figure 2). For training this network we used the resilient backpropagation algorithm [6] and iterated in batch mode (using the entire training set as one batch). The inputs layer has twenty nodes (one for each dimension of the representation) with a memory of three adjacent input frames. We used one hidden layer with forty nodes, and the output layer was five nodes, each corresponding to one of the five styles we wished to recognize. The training targets were a value of 1 for the node corresponding to the style of the input, and values of 0 for all the other nodes. In the testing phase, the features of the test set were extracted (using the same dimensionality reduction transform derived from the training data), and they were fed to the classification network. Styles were assigned to the output node corresponding to the maximum value.

3 G L L L B &, N AU 0?(U( ), )0 ( " T(+*-,./, (0( F X-* )R, DV H3 * V * T ;$ K C M 8>?8 v twc?dj? H=PQ@QZS%BJSS`F<M? HW'B\F"T>?1WC^IBV8+BT>B BCMFS>8%=PQPQ Q@BC8!BCO"e?WC8 79OJDHM X D Ä ;>D_foHG1=P8 BCQP=ZS`BC; bè =@TW.G oi=@l"yo F"B.G #lm>j9o BV8 m$hq[bv8 b %(M+B\F<DJÄ M>?; DF %D X DJQ@BC;&YWVF<m'BV8 DJT('EDJÄ AEDJQP=P8 BVF:<;!F<W(WCè? S?R:<=PO.BVQ )QPW8>D o W8 =*F"B\_:<WC8 HQ@BCO"eYb>B X>X B\:<; +cdj8 8(SH; D?8 DS,%M:<eBV?R: YWM?DcWC8&pBVF<? S(B 4.3 Results We ran a training and testing scheme where each row (collection of five artists across five styles) in turn was selected for testing. The remaining four rows were used for training. This process was executed five times (one for each row as a test,) and the results for each permutation are shown in Figure 3. As is clearly evident, the results are not particularly good for the IDM style. Most of the artists have been misclassified, and there is little cohesion among that style. This should not be construed as a shortcoming in the training method, as this is a music style that exhibits a huge auditory variance, ranging from aggressive rough beats to abstract and smooth textures. What ties these artists together as a style is not a common sound of their work, but rather a cultural affinity stemming from the use of electronic instruments, and common roots ranging back to electronic dance music. Likewise, we see inconsistent results for Lauryn Hill, classified as a rap artist due to her rap-like production. B 4&O, 7 9VR HO,2 $TS ) X R ) U S(P < ; A$ A I 0 (0(/ R 4$ I ) V )$ (0 ; )(5 $TS B U 8 W8`ì =PWQPDJ8(: AEDF<e(S?RGID.T=@?; =@8(:<DF<8 BV:<=PW8 BCQ =@8>8 DF OJW8>?=@?R:<D8`: X =Z::<DF OJQ@BC??=ZU+D.T [RM 8>=PWCF A>F<Ẁ T>M>OJD.T F<WLrBC8(:<=PO 9-Y V`a V`a V`a VW V`a V GDX V`a V GDY V`a V G V V`a V V`a V LOZ V`a VUVD\ LDL V`a VUVD] V`a VUVD] Such intra-style auditory inconsistencies are of course hard to overcome using any audio based system, highlighting the need for additional descriptors that factor in additional cultural issues. 5. COMMUNITY METADATA-BASED STYLE CLASSIFICATION We next describe using cultural features for style classification solely using the community metadata feature vectors described earlier. The cultural features for the 25 artists in our set were computed during work done on artist similarity early in Each artist is associated with a set of roughly 10,000 unigram terms, 10,000 bigram terms, 5,000 noun phrases and 100 adjectives. Each term was associated with an artist by it appearing on the same web document as the artists name but this alone does not prove a causal relation of description. Associated with each term is a score computed by the software (see Table 2) that considers position from the named referent and a gaussian window around the term frequency of the term divided by its document frequency. (Term frequency is how often a term appears relating to an artist and document frequency is how often the term appears overall.) The gaussian we used is: -.0/ :3<;>=@?A1CBD?FE G:HJI Here, -OP is the document frequency of a term, -. the term frequency of a term, and Q and H are parameters indicating the mean and deviation of the gaussian window. This method proved well in computing artist similarities (given a known artist similarity list, this metric could predict them adequately) but here we ask the same data to arrange the artists into clusters. KMLON ^- 9- _ S?,2 ; +* H4 ; =P ` /, 0, (0 _ H?, ; +* 5 7baL4 0?(0(S4 =4 ; )$ Y 4?(5dc ; ( *-, =?(,?( )$ H4 "R ) M - 0 G ( 0 X- +*L(* (e (0, * $,P5 5.1 Clustering Overlap Scores The community metadata system computes similarity by a simple overlap score. Each pair of artists is similar with unnormalized Cƒ PŠ3 fhƒ z<g œ` ih where h is a additive combination of every shared term s score. These scalars are unimportant on their own, but we can rank their values using each artist in our set as the ground artist to see which artists are more similar each other. Using this method, we compute the similarity matrix M(25,25), using each artist in the five-style set. (See Figure 4.) This matrix is then used to predict the style of each given artist. For each term type, we take each artist in turn and sort their overlap weight similarities to the other 24 artists in descending order. We then use prior knowledge of the actual styles of the 24 similar artists

4 Guns n Roses Billy Ray Cyrus DMX Boards of Canada Lauryn Hill 64 % 33 % 1 % 1 % 1 % AC/DC 21 % 53 % 19 % 1 % 7 % Alan Jackson 6 % 9 % 65 % 4 % 17 % Ice Cube 9 % 11 % 32 % 12 % 37 % Aphex Twin 23 % 4 % 35 % 8 % 30 % Aaliyah 54 % 9 % 9 % 8 % 21 % Skid Row 19 % 52 % 4 % 4 % 21 % Tim McGraw 0 % 1 % 73 % 10 % 15 % Wu Tang lan C 0 % 0 % 40 % 31 % 29 % Squarepusher 6 % 3 % 31 % 14 % 46 % Debelah Morgan 76 % 10 % 1 % 2 % 11 % Led Zeppelin 39 % 56 % 3 % 0 % 2 % Garth Brooks 13 % 21 % 31 % 16 % 19 % Mystikal 15 % 4 % 26 % 27 % 28 % Plone 3 % 1 % 14 % 18 % 64 % Toni Braxton 72 % 18 % 2 % 5 % 2 % Black Sabbath 25 % 60 % 8 % 6 % 1 % Kenny Chesney 0 % 0 % 51 % 20 % 35 % Outkast 17 % 7 % 27 % 16 % 33 % Mouse on Mars 17 % 11 % 20 % 10 % 42 % Mya 41 % 35 % 7 % 5 % 12 % 25 % 62 % 3 % 3 % 7 % 13 % 2 % 62 % 7 % 16 % 10 % 8 % 32 % 24 % 27 % 7 % 1 % 31 % 10 % 51 % - ; DAM $ 4%'& ( )$L(+*,2 H, (0(0 21O0 265eD ;(K$ ).0- )(0 H )$ & &O?, * 0-K 0 (K& ),? #N 4 0?/,2 (+*,2 V5 to find the true style of our target artist: descending the sorted list, once we have counted four other artists in the same cluster, we consider our target artist classified with a normalized score (the amount of cumulated overlap weights the cluster contributed to the total cumulated overlap weights.) The highest cumulated score is deemed the correct classification, and the five style scores are arranged in a probability map. In a larger-scale implementation, this step is akin to using a supervised clustering mechanism which tries to find a fit of an unknown type among already labeled data (by the same algorithm). Because of the small size of the sample set, we found this more manual method more effective. We do this for each term type in the community metadata feature space and average the returned maps into a generalized probability map. The map defines a confidence value for each style much like the neural network s results above, and the probability approach was crucial in integrating the two methods (which we describe below.) 5.2 Results In Figure 5 we see that the results for the text-only classifier performs very well for three of the styles and adequately but not perfectly for two of the five styles. There seems to be confusion between the Rap and R&B style sets. However, for the previous problem set (IDM), the cultural classifier works perfectly and with high confidence. We can attribute this to IDM being an almost purely culturally-defined style. One of the issues that plague acoustically-derived classifiers is that often human classifications have little statistical correlation to the actual content being described. This problem also interferes with content-based recommendation agents that attempt to learn a relation model between user preference and audio content: sometimes, the sound of the music has very little to do with how we perceive and attach preference to it. R&B and Rap s intrinsic crossover (they both appear on the same radio markets and are usually geared toward the same audiences) shows that the cultural classifier can be as confused as humans in the same situation. Here, we present the inverse of the description for content problem: just as often, cultural influences steer us away from treating two almost identical artists as similar entities, or putting them in the same class. We propose that automated systems that attempt to model listening behavior or provide commodity intelligence to music collections be mindful of both types of influences. Since we can ideally model both behaviors, it perhaps makes the most sense to combine them in some manner. 6. COMBINED CLASSIFICATION As pointed out in the preceding sections, some features which are crucial for style identification are best exploited in the auditory domain and some are best used in the cultural domain. So far, given our choice of domain, we have produced coherent clusters. Musical style (and even more so musical similarity) requires a complicated definition that can factor in multiple observations ranging from auditory, historical, geographical, ideological, etc. The community metadata is an effort to make up for the latter features, whereas the auditory domain helps on a more staunch judgment on the sound itself. It seems only natural that a combination of these two classifiers can help disambiguate some of the classification problems that we have discussed. In order to combine the two results we view our classifier data as posterior probabilities and compute their average values. This is a technique that has been shown to be good in practice, when we have a questionable estimate of posterior probabilities [7], as is the case in the cultural-based classification. 6.1 Results The results of the averaging are shown in Figure 6. It is clear that many of the problems that were present in the previous classification attempts are now resolved. The IDM class, which was problematic in the audio-based classification, is now correctly identified due to strong community metadata coherence. Likewise, the Rap cluster which was not well defined in the metadata classification, was correctly identified using the auditory influence. Overall the combined classification was correct for all samples, bypassing all the problems found in either audio or metadata only classification. 7. FUTURE WORK One less obvious use of this system is a cultural to musical ratio equation for relations among artists. An application that could know

5 Guns n Roses Billy Ray Cyrus DMX Boards of Canada Lauryn Hill 44 % 9 % 19 % 11 % 17 % AC/DC 5 % 80 % 5 % 4 % 5 % Alan Jackson 18 % 27 % 24 % 8 % 23 % Ice Cube 11 % 5 % 8 % 68 % 8 % Aphex Twin 21 % 13 % 23 % 11 % 33 % Aaliyah 30 % 13 % 16 % 23 % 18 % Skid Row 5 % 76 % 9 % 3 % 7 % Tim McGraw 22 % 18 % 28 % 15 % 17 % Wu Tang C lan 9 % 4 % 26 % 56 % 4 % Squarepusher 14 % 13 % 27 % 11 % 35 % Debelah Morgan 17 % 38 % 19 % 13 % 13 % Led Zeppelin 18 % 50 % 17 % 6 % 9 % Garth Brooks 10 % 7 % 28 % 37 % 18 % Mystikal 9 % 6 % 8 % 72 % 5 % Plone 11 % 10 % 20 % 11 % 47 % Toni Braxton 21 % 14 % 19 % 28 % 18 % Black Sabbath 11 % 60 % 10 % 8 % 11 % Kenny Chesney 17 % 30 % 16 % 9 % 28 % Outkast 10 % 6 % 10 % 65 % 9 % Mouse on Mars 17 % 16 % 26 % 11 % 30 % Mya 52 % 9 % 18 % 12 % 10 % 6 % 68 % 16 % 4 % 7 % 14 % 13 % 32 % 14 % 27 % 10 % 9 % 7 % 65 % 9 % 11 % 20 % 25 % 10 % 33 % - ;!a DR +* $ )%'&( 4$#(+*, F/, (0( D ;( $- 4- F F )(0 _ )$ &&?,? +* ;?( &3 ),? N 0 /, 4 ( *-, 5 Guns n Roses Billy Ray Cyrus DMX Boards of Canada Lauryn Hill 54 % 21 % 10 % 6 % 9 % AC/DC 13 % 66 % 12 % 3 % 6 % Alan Jackson 12 % 18 % 45 % 6 % 20 % Ice Cube 10 % 8 % 20 % 40 % 22 % Aphex Twin 22 % 8 % 29 % 9 % 31 % Aaliyah 42 % 11 % 12 % 15 % 20 % Skid Row 12 % 64 % 6 % 4 % 14 % Tim McGraw 11 % 10 % 51 % 13 % 16 % Wu Tang C lan 5 % 2 % 33 % 44 % 17 % Squarepusher 10 % 8 % 29 % 13 % 41 % Debelah Morgan 47 % 24 % 10 % 8 % 12 % Led Zeppelin 29 % 53 % 10 % 3 % 6 % Garth Brooks 12 % 14 % 29 % 27 % 19 % Mystikal 12 % 5 % 17 % 50 % 17 % Plone 7 % 5 % 17 % 14 % 56 % Toni Braxton 46 % 16 % 10 % 17 % 10 % Black Sabbath 18 % 60 % 9 % 7 % 6 % Kenny Chesney 9 % 15 % 33 % 14 % 31 % Outkast 13 % 7 % 18 % 40 % 21 % Mouse on Mars 17 % 13 % 23 % 11 % 36 % Mya 47 % 22 % 12 % 9 % 11 % 15 % 65 % 10 % 3 % 7 % 14 % 8 % 47 % 11 % 21 % 10 % 8 % 20 % 44 % 18 % 9 % 11 % 28 % 10 % 42 % - ; DR.&O )$ ( *-, #/, (0( ;( $ ) 0 )(0 _ )$ & &O?, * 0-0 ( &3 ), -? 4 0?/,2 (+*,2 V5

6 G K N 100 Style Classification Accuracy it. By combining both acoustic and cultural artist information we have achieved classification in styles that exhibit large variance in either domain. Accuracy (%) Audio Cultural Combined 0 Heavy Metal Country Rap IDM R&B - ; V K )(0, (Q ".,?, /, (0( ( -(5 AM// % Y*?( & ( )$==0 W/%?% an( *-,./,2 (0(0 21O 5 in advance how to understand varying types of artist relationships could benefit many music retrieval systems that attempt to inject commodity intelligence into the L&R process. A good case for such a technology would be a recommendation agent that operates on both acoustic and cultural data. Large scale record shops already compute cultural relationships using sale data fed into a collaborative filtering system, and music-based recommenders such as Moodlogic ( ' ' + ' ` 'š š J ) operate on spectral features. Both systems have proved successful for different types of music, and a system that could define ahead of time the proper set of features to use would be integral to a combination approach. We could simply define a culture ratio as K ON K K ON>N i.e. the probability that artists and will be similar using a cultural metric divided by the probability that artists and will be similar using an acoustic metric. A high culture ratio would alert a recommender that certain musical relationships (such as almost all in the IDM style) should be treated using a purely cultural feature space. Lower culture ratios would indicate that spectral or musically intelligent features should be used. 8. CONCLUSIONS We have presented a prominent problem in musical style classification, and proposed a multimodal classification scheme to overcome 9. ACKNOWLEDGMENTS This work was supported by the Digital Life Consortium of the MIT Media Lab. 10. REFERENCES [1] A. Berenzweig, D. Ellis, and S. Lawrence. Using voice segments to improve artist classification of music submitted. [2] W. Chai and B. Vercoe. Folk music classification using hidden markov models. In I "ƒ<ƒ z gc{frˆ" 9ƒ 3 ŠC z C Š V.ˆ ƒ ƒ "ƒ V z z Š " 9ƒ z<g ƒ "ƒ, [3] D. Clouse, C. Giles, B. Horne, and G. Cottrell. Time-delay neural networks: Representation and induction of finite state machines. In "!#!$!&% Š >{('+C ) ƒ* Š ) ƒ3 _f C V{,+.-0/(1, page 1065, [4] W. W. Cohen and W. Fan. Web-collaborative filtering: recommending music by crawling the web % 8*` 9ƒ ) ƒ3 _f V < V{, 33(1-6): , [5] R. B. Dannenberg, B. Thom, and D. Watson. A machine learning approach to musical style recognition. In " 9I "ƒ<ƒ z gc{ Rˆ gœ ƒ;: 5<5>=?" 9ƒ " ŠV z C %.* 9ƒ BA * {<z C V.ˆ ƒ ƒ "ƒ, pages International Computer Music Association., [6] M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In I ' 4ˆ gœ>ƒ "!$!#!D" ' C Jˆ 'EC ;) ƒ* ŠC ) ƒ3 _f V < V{, pages , San Francisco, CA, [7] D. Tax, M. van Breukelen, R. Duin, and J. Kittler. Combining multiple classifiers by averaging or by multiplying? In ŠC! 9ƒ 3 FE ƒ 9gC z z C -G<GH1. )c>'i5, pages , [8] G. Tzanetakis, G. Essl, and P. Cook. Automatic musical genre classification of audio signals, [9] B. Whitman, G. Flake, and S. Lawrence. Artist detection in music with minnowmatch. In R "ƒ"ƒ zg gc{1rˆ gœ>ƒkjml<ln: "!#!#! 2YC < V{ œ V &) ƒ* Š ) ƒ3 _f V < V{ĴC PÖ z<gc Š R "ƒ {"{<z ig, pages Falmouth, Massachusetts, September [10] B. Whitman and S. Lawrence. Inferring descriptions and similarity for music from community metadata submitted.

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Creating a Feature Vector to Identify Similarity between MIDI Files

Creating a Feature Vector to Identify Similarity between MIDI Files Creating a Feature Vector to Identify Similarity between MIDI Files Joseph Stroud 2017 Honors Thesis Advised by Sergio Alvarez Computer Science Department, Boston College 1 Abstract Today there are many

More information

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University

More information

Automatic Music Genre Classification

Automatic Music Genre Classification Automatic Music Genre Classification Nathan YongHoon Kwon, SUNY Binghamton Ingrid Tchakoua, Jackson State University Matthew Pietrosanu, University of Alberta Freya Fu, Colorado State University Yue Wang,

More information

An ecological approach to multimodal subjective music similarity perception

An ecological approach to multimodal subjective music similarity perception An ecological approach to multimodal subjective music similarity perception Stephan Baumann German Research Center for AI, Germany www.dfki.uni-kl.de/~baumann John Halloran Interact Lab, Department of

More information

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors * Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors * David Ortega-Pacheco and Hiram Calvo Centro de Investigación en Computación, Instituto Politécnico Nacional, Av. Juan

More information

Musical Hit Detection

Musical Hit Detection Musical Hit Detection CS 229 Project Milestone Report Eleanor Crane Sarah Houts Kiran Murthy December 12, 2008 1 Problem Statement Musical visualizers are programs that process audio input in order to

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Music Understanding and the Future of Music

Music Understanding and the Future of Music Music Understanding and the Future of Music Roger B. Dannenberg Professor of Computer Science, Art, and Music Carnegie Mellon University Why Computers and Music? Music in every human society! Computers

More information

DIGITAL COMMUNICATION

DIGITAL COMMUNICATION 10EC61 DIGITAL COMMUNICATION UNIT 3 OUTLINE Waveform coding techniques (continued), DPCM, DM, applications. Base-Band Shaping for Data Transmission Discrete PAM signals, power spectra of discrete PAM signals.

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Music Recommendation from Song Sets

Music Recommendation from Song Sets Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Music Information Retrieval with Temporal Features and Timbre

Music Information Retrieval with Temporal Features and Timbre Music Information Retrieval with Temporal Features and Timbre Angelina A. Tzacheva and Keith J. Bell University of South Carolina Upstate, Department of Informatics 800 University Way, Spartanburg, SC

More information

Speech and Speaker Recognition for the Command of an Industrial Robot

Speech and Speaker Recognition for the Command of an Industrial Robot Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32 Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0

More information

Melody Retrieval On The Web

Melody Retrieval On The Web Melody Retrieval On The Web Thesis proposal for the degree of Master of Science at the Massachusetts Institute of Technology M.I.T Media Laboratory Fall 2000 Thesis supervisor: Barry Vercoe Professor,

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

The Human Features of Music.

The Human Features of Music. The Human Features of Music. Bachelor Thesis Artificial Intelligence, Social Studies, Radboud University Nijmegen Chris Kemper, s4359410 Supervisor: Makiko Sadakata Artificial Intelligence, Social Studies,

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Composer Style Attribution

Composer Style Attribution Composer Style Attribution Jacqueline Speiser, Vishesh Gupta Introduction Josquin des Prez (1450 1521) is one of the most famous composers of the Renaissance. Despite his fame, there exists a significant

More information

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007

A combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007 A combination of approaches to solve Tas How Many Ratings? of the KDD CUP 2007 Jorge Sueiras C/ Arequipa +34 9 382 45 54 orge.sueiras@neo-metrics.com Daniel Vélez C/ Arequipa +34 9 382 45 54 José Luis

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

AUDIO/VISUAL INDEPENDENT COMPONENTS

AUDIO/VISUAL INDEPENDENT COMPONENTS AUDIO/VISUAL INDEPENDENT COMPONENTS Paris Smaragdis Media Laboratory Massachusetts Institute of Technology Cambridge MA 039, USA paris@media.mit.edu Michael Casey Department of Computing City University

More information

Automatic Construction of Synthetic Musical Instruments and Performers

Automatic Construction of Synthetic Musical Instruments and Performers Ph.D. Thesis Proposal Automatic Construction of Synthetic Musical Instruments and Performers Ning Hu Carnegie Mellon University Thesis Committee Roger B. Dannenberg, Chair Michael S. Lewicki Richard M.

More information

ISMIR 2008 Session 2a Music Recommendation and Organization

ISMIR 2008 Session 2a Music Recommendation and Organization A COMPARISON OF SIGNAL-BASED MUSIC RECOMMENDATION TO GENRE LABELS, COLLABORATIVE FILTERING, MUSICOLOGICAL ANALYSIS, HUMAN RECOMMENDATION, AND RANDOM BASELINE Terence Magno Cooper Union magno.nyc@gmail.com

More information

Automatic Music Clustering using Audio Attributes

Automatic Music Clustering using Audio Attributes Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

Week 14 Music Understanding and Classification

Week 14 Music Understanding and Classification Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n

More information

2. Problem formulation

2. Problem formulation Artificial Neural Networks in the Automatic License Plate Recognition. Ascencio López José Ignacio, Ramírez Martínez José María Facultad de Ciencias Universidad Autónoma de Baja California Km. 103 Carretera

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

DISTRIBUTION STATEMENT A 7001Ö

DISTRIBUTION STATEMENT A 7001Ö Serial Number 09/678.881 Filing Date 4 October 2000 Inventor Robert C. Higgins NOTICE The above identified patent application is available for licensing. Requests for information should be addressed to:

More information

The Quest for Ground Truth in Musical Artist Similarity

The Quest for Ground Truth in Musical Artist Similarity The Quest for Ground Truth in Musical Artist Similarity Daniel P.W. Ellis Columbia University New York NY U.S.A. Brian Whitman MIT Media Lab Cambridge MA U.S.A. Adam Berenzweig Columbia University New

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

AudioRadar. A metaphorical visualization for the navigation of large music collections

AudioRadar. A metaphorical visualization for the navigation of large music collections AudioRadar A metaphorical visualization for the navigation of large music collections Otmar Hilliges, Phillip Holzer, René Klüber, Andreas Butz Ludwig-Maximilians-Universität München AudioRadar An Introduction

More information

Automatic LP Digitalization Spring Group 6: Michael Sibley, Alexander Su, Daphne Tsatsoulis {msibley, ahs1,

Automatic LP Digitalization Spring Group 6: Michael Sibley, Alexander Su, Daphne Tsatsoulis {msibley, ahs1, Automatic LP Digitalization 18-551 Spring 2011 Group 6: Michael Sibley, Alexander Su, Daphne Tsatsoulis {msibley, ahs1, ptsatsou}@andrew.cmu.edu Introduction This project was originated from our interest

More information

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or

More information

Acoustic Scene Classification

Acoustic Scene Classification Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1 Outline Acoustic Scene Classification - definition History and state of

More information

A New Method for Calculating Music Similarity

A New Method for Calculating Music Similarity A New Method for Calculating Music Similarity Eric Battenberg and Vijay Ullal December 12, 2006 Abstract We introduce a new technique for calculating the perceived similarity of two songs based on their

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Automatic Labelling of tabla signals

Automatic Labelling of tabla signals ISMIR 2003 Oct. 27th 30th 2003 Baltimore (USA) Automatic Labelling of tabla signals Olivier K. GILLET, Gaël RICHARD Introduction Exponential growth of available digital information need for Indexing and

More information

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function EE391 Special Report (Spring 25) Automatic Chord Recognition Using A Summary Autocorrelation Function Advisor: Professor Julius Smith Kyogu Lee Center for Computer Research in Music and Acoustics (CCRMA)

More information

Reducing False Positives in Video Shot Detection

Reducing False Positives in Video Shot Detection Reducing False Positives in Video Shot Detection Nithya Manickam Computer Science & Engineering Department Indian Institute of Technology, Bombay Powai, India - 400076 mnitya@cse.iitb.ac.in Sharat Chandran

More information

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance Methodologies for Expressiveness Modeling of and for Music Performance by Giovanni De Poli Center of Computational Sonology, Department of Information Engineering, University of Padova, Padova, Italy About

More information

Query By Humming: Finding Songs in a Polyphonic Database

Query By Humming: Finding Songs in a Polyphonic Database Query By Humming: Finding Songs in a Polyphonic Database John Duchi Computer Science Department Stanford University jduchi@stanford.edu Benjamin Phipps Computer Science Department Stanford University bphipps@stanford.edu

More information

Features for Audio and Music Classification

Features for Audio and Music Classification Features for Audio and Music Classification Martin F. McKinney and Jeroen Breebaart Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands

More information

LSTM Neural Style Transfer in Music Using Computational Musicology

LSTM Neural Style Transfer in Music Using Computational Musicology LSTM Neural Style Transfer in Music Using Computational Musicology Jett Oristaglio Dartmouth College, June 4 2017 1. Introduction In the 2016 paper A Neural Algorithm of Artistic Style, Gatys et al. discovered

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Using Genre Classification to Make Content-based Music Recommendations

Using Genre Classification to Make Content-based Music Recommendations Using Genre Classification to Make Content-based Music Recommendations Robbie Jones (rmjones@stanford.edu) and Karen Lu (karenlu@stanford.edu) CS 221, Autumn 2016 Stanford University I. Introduction Our

More information

Perceptual Evaluation of Automatically Extracted Musical Motives

Perceptual Evaluation of Automatically Extracted Musical Motives Perceptual Evaluation of Automatically Extracted Musical Motives Oriol Nieto 1, Morwaread M. Farbood 2 Dept. of Music and Performing Arts Professions, New York University, USA 1 oriol@nyu.edu, 2 mfarbood@nyu.edu

More information

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are universal function approximators Boolean functions, classifiers, and regressions MLPs can be

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Marcello Herreshoff In collaboration with Craig Sapp (craig@ccrma.stanford.edu) 1 Motivation We want to generative

More information

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Róisín Loughran roisin.loughran@ul.ie Jacqueline Walker jacqueline.walker@ul.ie Michael O Neill University

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

Brain.fm Theory & Process

Brain.fm Theory & Process Brain.fm Theory & Process At Brain.fm we develop and deliver functional music, directly optimized for its effects on our behavior. Our goal is to help the listener achieve desired mental states such as

More information

MPEG has been established as an international standard

MPEG has been established as an international standard 1100 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 7, OCTOBER 1999 Fast Extraction of Spatially Reduced Image Sequences from MPEG-2 Compressed Video Junehwa Song, Member,

More information

gresearch Focus Cognitive Sciences

gresearch Focus Cognitive Sciences Learning about Music Cognition by Asking MIR Questions Sebastian Stober August 12, 2016 CogMIR, New York City sstober@uni-potsdam.de http://www.uni-potsdam.de/mlcog/ MLC g Machine Learning in Cognitive

More information

SIGNAL + CONTEXT = BETTER CLASSIFICATION

SIGNAL + CONTEXT = BETTER CLASSIFICATION SIGNAL + CONTEXT = BETTER CLASSIFICATION Jean-Julien Aucouturier Grad. School of Arts and Sciences The University of Tokyo, Japan François Pachet, Pierre Roy, Anthony Beurivé SONY CSL Paris 6 rue Amyot,

More information

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular Music Mood Sheng Xu, Albert Peyton, Ryan Bhular What is Music Mood A psychological & musical topic Human emotions conveyed in music can be comprehended from two aspects: Lyrics Music Factors that affect

More information

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION Jordan Hochenbaum 1,2 New Zealand School of Music 1 PO Box 2332 Wellington 6140, New Zealand hochenjord@myvuw.ac.nz

More information

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016 6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that

More information

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors Polyphonic Audio Matching for Score Following and Intelligent Audio Editors Roger B. Dannenberg and Ning Hu School of Computer Science, Carnegie Mellon University email: dannenberg@cs.cmu.edu, ninghu@cs.cmu.edu,

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Joint bottom-up/top-down machine learning structures to simulate human audition and musical creativity

Joint bottom-up/top-down machine learning structures to simulate human audition and musical creativity Joint bottom-up/top-down machine learning structures to simulate human audition and musical creativity Jonas Braasch Director of Operations, Professor, School of Architecture Rensselaer Polytechnic Institute,

More information

Assigning and Visualizing Music Genres by Web-based Co-Occurrence Analysis

Assigning and Visualizing Music Genres by Web-based Co-Occurrence Analysis Assigning and Visualizing Music Genres by Web-based Co-Occurrence Analysis Markus Schedl 1, Tim Pohle 1, Peter Knees 1, Gerhard Widmer 1,2 1 Department of Computational Perception, Johannes Kepler University,

More information

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t MPEG-7 FOR CONTENT-BASED MUSIC PROCESSING Λ Emilia GÓMEZ, Fabien GOUYON, Perfecto HERRERA and Xavier AMATRIAIN Music Technology Group, Universitat Pompeu Fabra, Barcelona, SPAIN http://www.iua.upf.es/mtg

More information

Learning Word Meanings and Descriptive Parameter Spaces from Music. Brian Whitman, Deb Roy and Barry Vercoe MIT Media Lab

Learning Word Meanings and Descriptive Parameter Spaces from Music. Brian Whitman, Deb Roy and Barry Vercoe MIT Media Lab Learning Word Meanings and Descriptive Parameter Spaces from Music Brian Whitman, Deb Roy and Barry Vercoe MIT Media Lab Music intelligence Structure Structure Genre Genre / / Style Style ID ID Song Song

More information

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Story Tracking in Video News Broadcasts Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Acknowledgements Motivation Modern world is awash in information Coming from multiple sources Around the clock

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication Proceedings of the 3 rd International Conference on Control, Dynamic Systems, and Robotics (CDSR 16) Ottawa, Canada May 9 10, 2016 Paper No. 110 DOI: 10.11159/cdsr16.110 A Parametric Autoregressive Model

More information

MindMouse. This project is written in C++ and uses the following Libraries: LibSvm, kissfft, BOOST File System, and Emotiv Research Edition SDK.

MindMouse. This project is written in C++ and uses the following Libraries: LibSvm, kissfft, BOOST File System, and Emotiv Research Edition SDK. Andrew Robbins MindMouse Project Description: MindMouse is an application that interfaces the user s mind with the computer s mouse functionality. The hardware that is required for MindMouse is the Emotiv

More information

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound Pitch Perception and Grouping HST.723 Neural Coding and Perception of Sound Pitch Perception. I. Pure Tones The pitch of a pure tone is strongly related to the tone s frequency, although there are small

More information

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS Juhan Nam Stanford

More information

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University A Pseudo-Statistical Approach to Commercial Boundary Detection........ Prasanna V Rangarajan Dept of Electrical Engineering Columbia University pvr2001@columbia.edu 1. Introduction Searching and browsing

More information

Computational Modelling of Harmony

Computational Modelling of Harmony Computational Modelling of Harmony Simon Dixon Centre for Digital Music, Queen Mary University of London, Mile End Rd, London E1 4NS, UK simon.dixon@elec.qmul.ac.uk http://www.elec.qmul.ac.uk/people/simond

More information

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH Proc. of the th Int. Conference on Digital Audio Effects (DAFx-), Hamburg, Germany, September -8, HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH George Tzanetakis, Georg Essl Computer

More information

The Million Song Dataset

The Million Song Dataset The Million Song Dataset AUDIO FEATURES The Million Song Dataset There is no data like more data Bob Mercer of IBM (1985). T. Bertin-Mahieux, D.P.W. Ellis, B. Whitman, P. Lamere, The Million Song Dataset,

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

Analysis and Clustering of Musical Compositions using Melody-based Features

Analysis and Clustering of Musical Compositions using Melody-based Features Analysis and Clustering of Musical Compositions using Melody-based Features Isaac Caswell Erika Ji December 13, 2013 Abstract This paper demonstrates that melodic structure fundamentally differentiates

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

A Case Based Approach to the Generation of Musical Expression

A Case Based Approach to the Generation of Musical Expression A Case Based Approach to the Generation of Musical Expression Taizan Suzuki Takenobu Tokunaga Hozumi Tanaka Department of Computer Science Tokyo Institute of Technology 2-12-1, Oookayama, Meguro, Tokyo

More information