Learning Word Meanings and Descriptive Parameter Spaces from Music. Brian Whitman, Deb Roy and Barry Vercoe MIT Media Lab

Similar documents
Learning the meaning of music


Singer Traits Identification using Deep Neural Network

MUSI-6201 Computational Music Analysis

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

Supervised Learning in Genre Classification

Using Genre Classification to Make Content-based Music Recommendations

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Speech Recognition and Signal Processing for Broadcast News Transcription

A Computational Model for Discriminating Music Performers

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Music Genre Classification

Music Information Retrieval

Acoustic Scene Classification

AUTOMATIC RECORD REVIEWS

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Outline. Why do we classify? Audio Classification

Computer Coordination With Popular Music: A New Research Agenda 1

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Creating a Feature Vector to Identify Similarity between MIDI Files

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

Extracting Information from Music Audio

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Music Information Retrieval Community

Music Information Retrieval with Temporal Features and Timbre

Brain.fm Theory & Process

Psychoacoustic Evaluation of Fan Noise

jsymbolic 2: New Developments and Research Opportunities

Speech and Speaker Recognition for the Command of an Industrial Robot

Perception and Sound Design

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

Machine-Assisted Indexing. Week 12 LBSC 671 Creating Information Infrastructures

Lecture 15: Research at LabROSA

The following General Music performance objectives are integrated throughout the entire course: MUSIC SKILLS

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Singer Identification

Features for Audio and Music Classification

Music Complexity Descriptors. Matt Stabile June 6 th, 2008

Music Genre Classification and Variance Comparison on Number of Genres

MusCat: A Music Browser Featuring Abstract Pictures and Zooming User Interface

Computer Audio and Music

Assigning and Visualizing Music Genres by Web-based Co-Occurrence Analysis

- CROWD REVIEW FOR - Dance Of The Drum

Analysis, Synthesis, and Perception of Musical Sounds

Detecting Musical Key with Supervised Learning

Content. Learning Outcomes. In this lesson you will learn all about antonyms.

TOWARDS A SOCIO-CULTURAL COMPATIBILITY OF MIR SYSTEMS

Automatic Construction of Synthetic Musical Instruments and Performers

The MAMI Query-By-Voice Experiment Collecting and annotating vocal queries for music information retrieval

The Million Song Dataset

ECG Denoising Using Singular Value Decomposition

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Loudness of transmitted speech signals for SWB and FB applications

AUD 6306 Speech Science

Music Segmentation Using Markov Chain Methods

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Music Radar: A Web-based Query by Humming System

Music Curriculum Kindergarten

Sentiment Analysis. Andrea Esuli

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

Melody Retrieval On The Web

Introduction to Sentiment Analysis. Text Analytics - Andrea Esuli

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Improving MeSH Classification of Biomedical Articles using Citation Contexts

NORTH MONTCO TECHNICAL CAREER CENTER PDE READING ELIGIBLE CONTENT CROSSWALK TO ASSESSMENT ANCHORS

Tiptop audio z-dsp.

Inferring Descriptions and Similarity for Music from Community Metadata

Rhythm Rounds. Joyce Ma. January 2003

Proposal for Application of Speech Techniques to Music Analysis

ZX-44XL Liquid Fuel Analyzer. User s Manual Version 1.2

Subjective evaluation of common singing skills using the rank ordering method

Enhancing Music Maps

Norman Public Schools MUSIC ASSESSMENT GUIDE FOR GRADE 8

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

How do scoops influence the perception of singing accuracy?

Acoustic concert halls (Statistical calculation, wave acoustic theory with reference to reconstruction of Saint- Petersburg Kapelle and philharmonic)

Sentiment Aggregation using ConceptNet Ontology

WHAT'S HOT: LINEAR POPULARITY PREDICTION FROM TV AND SOCIAL USAGE DATA Jan Neumann, Xiaodong Yu, and Mohamad Ali Torkamani Comcast Labs

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections

Universal Parallel Computing Research Center The Center for New Music and Audio Technologies University of California, Berkeley

Automatic Laughter Detection

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Crash Course in Digital Signal Processing

A FUNCTIONAL CLASSIFICATION OF ONE INSTRUMENT S TIMBRES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Automatic Detection of Sarcasm in BBS Posts Based on Sarcasm Classification

Topics in Computer Music Instrument Identification. Ioanna Karydi

Audio Feature Extraction for Corpus Analysis

Study of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

Chapter Two: Long-Term Memory for Timbre

GENDER IDENTIFICATION AND AGE ESTIMATION OF USERS BASED ON MUSIC METADATA

Classification of Timbre Similarity

SPELLING BOOKLET. Grade 5 Term 3. Are you ready for some magic Spells? SURNAME, NAME: CLASS: 051-eng-wb3 -(spelling)

CS 7643: Deep Learning

Transcription:

Learning Word Meanings and Descriptive Parameter Spaces from Music Brian Whitman, Deb Roy and Barry Vercoe MIT Media Lab

Music intelligence Structure Structure Genre Genre / / Style Style ID ID Song Song similarity similarity Recommendation Recommendation Artist Artist ID ID Synthesis Synthesis Extracting salience from a signal Learning is features and regression ROCK/POP Classical

Semantic decomposition Music models from unsupervised methods find statistically significant parameters Can we identify the optimal semantic attributes for understanding music? Female/Male Angry/Calm

Community metadata Whitman / Lawrence (ICMC2002) Internet-mined description of music Embed description as kernel space Community-derived meaning Time-aware!

Language Processing for IR Web page to feature vector HTML Aosid asduh asdihu asiuh oiasjodijasodjioaisjdsaioj aoijsoidjaosjidsaidoj. Oiajsdoijasoijd. Iasoijdoijasoijdaisjd. Asij aijsdoij. Aoijsdoijasdiojas. Aiasijdoiajsdj., asijdiojad iojasodijasiioas asjidijoasd oiajsdoijasd ioajsdojiasiojd iojasdoijasoidj. Asidjsadjd iojasdoijasoijdijdsa. IOJ iojasdoijaoisjd. Ijiojsad. Sentence Chunks. XTC was one of the smartest and catchiest British pop bands to emerge from the punk and new wave explosion of the late '70s.. n1 n2 n3 XTC Was One Of the Smartest And Catchiest British Pop Bands To Emerge From Punk New wave XTC was Was one One of Of the The smartest Smartest and And catchiest Catchiest british British pop Pop bands Bands to To emerge Emerge from From the The punk Punk and And new XTC was one Was one of One of the Of the smartest The smartest and Smartest and catchiest And catchiest british Catchiest british pop British pop bands Pop bands to Bands to emerge To emerge from Emerge from the From the punk The punk and Punk and new And new wave np artist adj XTC Catchiest british pop bands British pop bands Pop bands Punk and new wave explosion XTC Smartest Catchiest British New late

Smoothed TF-IDF s ( f, f ) = t d f f t d s( f t, f d ) = f t e -(log( 2s f d 2 )-m ) 2

Query by description (audio) What does loud mean? Play me something fast with an electronic beat Single-term to frame attachment

Learning QBD Audio features, aritst 0, frame 1 Electronic 0.30 Loud 0.30 Talented 2.0 Audio features, aritst 0, frame 2 Electronic 0.30 Loud 0.30 Talented 2.0 Audio features, aritst 0, frame 3 Electronic 0.30 Loud 0.30 Talented 2.0 Audio features, aritst 1, frame 1 Electronic 0.1 Loud 3.23 Talented 0.4 Audio features, aritst 1, frame 2 Electronic 0.1 Loud 3.23 Talented 0.4 Audio features, aritst 3, frame 1 Electronic 0 Loud 0.95 Talented 0 Audio features, aritst 3, frame 2 Electronic 0 Loud 0.95 Talented 0 Audio features, aritst 3, frame 3 Electronic 0 Loud 0.95 Talented 0

Learning formalization Learn relation between audio and naturally encountered description Can t trust target class! Opinion Counterfactuals Wrong artist Not musical 200,000 possible terms (output classes!) (For this experiment we limit it to adjectives)

Regularized least-squares classification (RLSC) (Rifkin 2002) ( x i, x j ) È- xi - x expí Í 2d Î = 2 j 2 K ( K I + )c = C t y t I -1 c t = ( K + ) y t C c t = machine for class t y t = truth vector for class t C = regularization constant (10)

Time-aware audio features MPEG-7 derived state-paths (Casey 2001) Music as discrete path through time Reg d to 20 states 0.1 s

Per-term accuracy Good terms Bad terms Busy 42% Artistic Steady 41% Homeless Funky 39% Hungry Intense 38% Great Acoustic 36% Awful African 35% Warped Melodic 27% Illegal Romantic 23% Cruel Slow 21% Notorious Wild 25% Good Young 17% Okay Weighted accuracy (to allow for bias)

The linguistic expert Some semantic attachment requires lookups to an expert Dark Big Light? Small

Linguistic expert Perception + observed language: Big Lookups to linguistic expert: Light Dark Small Big Dark Small Light Allows you to infer new gradation:? Big Dark Small Light

Parameters: synants of quiet The antonym of every synonym and the synonym of every antonym. thundering quiet noisy soft clangorous hard Antonyms Synonyms

Top descriptive parameters All P(a) of terms in anchor synant sets averaged P(quiet) = 0.2, P(loud) = 0.4, P(quiet-loud) = 0.3. Sorted list gives best grounded parameter map Good parameters Bad parameters Big little 3 Evil good 5% Present past 29% Bad good Unusual familiar 28% Violent nonviolent 1% Low high 27% Extraordinary ordinary Male female 22% Cool warm 7% Hard soft 21% Red white 6% Loud soft 19% Second first 4% Smooth rough 14% Full empty Vocal instrumental 1 Internal external Minor major 1 Foul fair 5%

Learning the knobs Nonlinear dimension reduction Isomap Like PCA/NMF/MDS, but: Meaning oriented Better perceptual distance Only feed polar observations as input Future data can be quickly semantically classified with guaranteed expressivity Quiet Male Loud Female

Parameter understanding Some knobs aren t 1-D intrinsically Color spaces & user models!

Future: music acquisition Short term music model: auditory scene to events Structural music model: recurring patterns in music streams Language of music: relating artists to descriptions (cultural representation) Music acceptance models: path of music through social network Grounding sound, what does loud mean? Semantics of music: what does rock mean? What makes a song popular? Semantic synthesis

Reverse: semantic synthesis What does college rock sound like? Meaning as transition probabilities Loud rock with electronics

What s next Human evaluation Inter-rater reliability can we trust the internet for community meaning? Meaning recognition (time) Hierarchy learning