Learning Word Meanings and Descriptive Parameter Spaces from Music Brian Whitman, Deb Roy and Barry Vercoe MIT Media Lab
Music intelligence Structure Structure Genre Genre / / Style Style ID ID Song Song similarity similarity Recommendation Recommendation Artist Artist ID ID Synthesis Synthesis Extracting salience from a signal Learning is features and regression ROCK/POP Classical
Semantic decomposition Music models from unsupervised methods find statistically significant parameters Can we identify the optimal semantic attributes for understanding music? Female/Male Angry/Calm
Community metadata Whitman / Lawrence (ICMC2002) Internet-mined description of music Embed description as kernel space Community-derived meaning Time-aware!
Language Processing for IR Web page to feature vector HTML Aosid asduh asdihu asiuh oiasjodijasodjioaisjdsaioj aoijsoidjaosjidsaidoj. Oiajsdoijasoijd. Iasoijdoijasoijdaisjd. Asij aijsdoij. Aoijsdoijasdiojas. Aiasijdoiajsdj., asijdiojad iojasodijasiioas asjidijoasd oiajsdoijasd ioajsdojiasiojd iojasdoijasoidj. Asidjsadjd iojasdoijasoijdijdsa. IOJ iojasdoijaoisjd. Ijiojsad. Sentence Chunks. XTC was one of the smartest and catchiest British pop bands to emerge from the punk and new wave explosion of the late '70s.. n1 n2 n3 XTC Was One Of the Smartest And Catchiest British Pop Bands To Emerge From Punk New wave XTC was Was one One of Of the The smartest Smartest and And catchiest Catchiest british British pop Pop bands Bands to To emerge Emerge from From the The punk Punk and And new XTC was one Was one of One of the Of the smartest The smartest and Smartest and catchiest And catchiest british Catchiest british pop British pop bands Pop bands to Bands to emerge To emerge from Emerge from the From the punk The punk and Punk and new And new wave np artist adj XTC Catchiest british pop bands British pop bands Pop bands Punk and new wave explosion XTC Smartest Catchiest British New late
Smoothed TF-IDF s ( f, f ) = t d f f t d s( f t, f d ) = f t e -(log( 2s f d 2 )-m ) 2
Query by description (audio) What does loud mean? Play me something fast with an electronic beat Single-term to frame attachment
Learning QBD Audio features, aritst 0, frame 1 Electronic 0.30 Loud 0.30 Talented 2.0 Audio features, aritst 0, frame 2 Electronic 0.30 Loud 0.30 Talented 2.0 Audio features, aritst 0, frame 3 Electronic 0.30 Loud 0.30 Talented 2.0 Audio features, aritst 1, frame 1 Electronic 0.1 Loud 3.23 Talented 0.4 Audio features, aritst 1, frame 2 Electronic 0.1 Loud 3.23 Talented 0.4 Audio features, aritst 3, frame 1 Electronic 0 Loud 0.95 Talented 0 Audio features, aritst 3, frame 2 Electronic 0 Loud 0.95 Talented 0 Audio features, aritst 3, frame 3 Electronic 0 Loud 0.95 Talented 0
Learning formalization Learn relation between audio and naturally encountered description Can t trust target class! Opinion Counterfactuals Wrong artist Not musical 200,000 possible terms (output classes!) (For this experiment we limit it to adjectives)
Regularized least-squares classification (RLSC) (Rifkin 2002) ( x i, x j ) È- xi - x expí Í 2d Î = 2 j 2 K ( K I + )c = C t y t I -1 c t = ( K + ) y t C c t = machine for class t y t = truth vector for class t C = regularization constant (10)
Time-aware audio features MPEG-7 derived state-paths (Casey 2001) Music as discrete path through time Reg d to 20 states 0.1 s
Per-term accuracy Good terms Bad terms Busy 42% Artistic Steady 41% Homeless Funky 39% Hungry Intense 38% Great Acoustic 36% Awful African 35% Warped Melodic 27% Illegal Romantic 23% Cruel Slow 21% Notorious Wild 25% Good Young 17% Okay Weighted accuracy (to allow for bias)
The linguistic expert Some semantic attachment requires lookups to an expert Dark Big Light? Small
Linguistic expert Perception + observed language: Big Lookups to linguistic expert: Light Dark Small Big Dark Small Light Allows you to infer new gradation:? Big Dark Small Light
Parameters: synants of quiet The antonym of every synonym and the synonym of every antonym. thundering quiet noisy soft clangorous hard Antonyms Synonyms
Top descriptive parameters All P(a) of terms in anchor synant sets averaged P(quiet) = 0.2, P(loud) = 0.4, P(quiet-loud) = 0.3. Sorted list gives best grounded parameter map Good parameters Bad parameters Big little 3 Evil good 5% Present past 29% Bad good Unusual familiar 28% Violent nonviolent 1% Low high 27% Extraordinary ordinary Male female 22% Cool warm 7% Hard soft 21% Red white 6% Loud soft 19% Second first 4% Smooth rough 14% Full empty Vocal instrumental 1 Internal external Minor major 1 Foul fair 5%
Learning the knobs Nonlinear dimension reduction Isomap Like PCA/NMF/MDS, but: Meaning oriented Better perceptual distance Only feed polar observations as input Future data can be quickly semantically classified with guaranteed expressivity Quiet Male Loud Female
Parameter understanding Some knobs aren t 1-D intrinsically Color spaces & user models!
Future: music acquisition Short term music model: auditory scene to events Structural music model: recurring patterns in music streams Language of music: relating artists to descriptions (cultural representation) Music acceptance models: path of music through social network Grounding sound, what does loud mean? Semantics of music: what does rock mean? What makes a song popular? Semantic synthesis
Reverse: semantic synthesis What does college rock sound like? Meaning as transition probabilities Loud rock with electronics
What s next Human evaluation Inter-rater reliability can we trust the internet for community meaning? Meaning recognition (time) Hierarchy learning