Analyzer Documentation

Similar documents
Music Representations

Automatic Music Clustering using Audio Attributes

Tempo and Beat Analysis

LESSON 1 PITCH NOTATION AND INTERVALS

Week 14 Music Understanding and Classification

Connecticut State Department of Education Music Standards Middle School Grades 6-8

CSC475 Music Information Retrieval

Music at Menston Primary School

AP Music Theory Summer Assignment

Music Complexity Descriptors. Matt Stabile June 6 th, 2008

Alleghany County Schools Curriculum Guide

CS 591 S1 Computational Audio

Music Representations

Elements of Music David Scoggin OLLI Understanding Jazz Fall 2016

ST. JOHN S EVANGELICAL LUTHERAN SCHOOL Curriculum in Music. Ephesians 5:19-20

HST 725 Music Perception & Cognition Assignment #1 =================================================================

Influence of timbre, presence/absence of tonal hierarchy and musical training on the perception of musical tension and relaxation schemas

Music Fundamentals. All the Technical Stuff

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Fundamentals of Music Theory MUSIC 110 Mondays & Wednesdays 4:30 5:45 p.m. Fine Arts Center, Music Building, room 44

Past papers. for graded examinations in music theory Grade 1

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function

An Integrated Music Chromaticism Model

Curriculum Standard One: The student will listen to and analyze music critically, using the vocabulary and language of music.

Lesson Week: August 17-19, 2016 Grade Level: 11 th & 12 th Subject: Advanced Placement Music Theory Prepared by: Aaron Williams Overview & Purpose:

WASD PA Core Music Curriculum

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Audiation: Ability to hear and understand music without the sound being physically

Quarterly Progress and Status Report. Perception of just noticeable time displacement of a tone presented in a metrical sequence at different tempos

MUSI-6201 Computational Music Analysis

FINE ARTS Institutional (ILO), Program (PLO), and Course (SLO) Alignment

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

Music Similarity and Cover Song Identification: The Case of Jazz

Introductions to Music Information Retrieval

Music Alignment and Applications. Introduction

Music Curriculum Glossary

Pitch Perception. Roger Shepard

MUSIC CURRICULM MAP: KEY STAGE THREE:

1 Introduction to PSQM

Student Performance Q&A:

CSC475 Music Information Retrieval

Student Performance Q&A:

(12) Patent Application Publication (10) Pub. No.: US 2006/ A1

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

SAMPLE ASSESSMENT TASKS MUSIC GENERAL YEAR 12

Music Solo Performance

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

SAMPLE ASSESSMENT TASKS MUSIC CONTEMPORARY ATAR YEAR 11

Pitfalls and Windfalls in Corpus Studies of Pop/Rock Music

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Music Curriculum Map Year 5

Computer Coordination With Popular Music: A New Research Agenda 1

Study Guide. Solutions to Selected Exercises. Foundations of Music and Musicianship with CD-ROM. 2nd Edition. David Damschroder

Music Theory. Fine Arts Curriculum Framework. Revised 2008

Music Curriculum Kindergarten

Texas State Solo & Ensemble Contest. May 26 & May 28, Theory Test Cover Sheet

jsymbolic and ELVIS Cory McKay Marianopolis College Montreal, Canada

MMTA Written Theory Exam Requirements Level 3 and Below. b. Notes on grand staff from Low F to High G, including inner ledger lines (D,C,B).

AH-8-SA-S-Mu3 Students will listen to and explore how changing different elements results in different musical effects

Outline. Why do we classify? Audio Classification

Instrumental Performance Band 7. Fine Arts Curriculum Framework

Grade 4 General Music

MUSIC THEORY & MIDI Notation Software

Playing Body Percussion Playing on Instruments. Moving Choreography Interpretive Dance. Listening Listening Skills Critique Audience Etiquette

Northeast High School AP Music Theory Summer Work Answer Sheet

PASADENA INDEPENDENT SCHOOL DISTRICT Fine Arts Teaching Strategies Band - Grade Six

FUNDAMENTALS OF MUSIC ONLINE

Robert Alexandru Dobre, Cristian Negrescu

I. Students will use body, voice and instruments as means of musical expression.

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

jsymbolic 2: New Developments and Research Opportunities

A SEMANTIC DIFFERENTIAL STUDY OF LOW AMPLITUDE SUPERSONIC AIRCRAFT NOISE AND OTHER TRANSIENT SOUNDS

Music, Grade 9, Open (AMU1O)

ARECENT emerging area of activity within the music information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Essentials Skills for Music 1 st Quarter

Years 7 and 8 standard elaborations Australian Curriculum: Music

Course Title: Chorale, Concert Choir, Master s Chorus Grade Level: 9-12

AP Music Theory Syllabus

Detecting Musical Key with Supervised Learning

Music overview. Autumn Spring Summer Explore and experiment with sounds. sound patterns Sing a few familiar songs. to songs and other music, rhymes

First Steps. Music Scope & Sequence

Music Theory Fundamentals/AP Music Theory Syllabus. School Year:

The Composer s Materials

UNIT 1: QUALITIES OF SOUND. DURATION (RHYTHM)

WASD PA Core Music Curriculum

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

A Beat Tracking System for Audio Signals

SAMPLE ASSESSMENT TASKS MUSIC JAZZ ATAR YEAR 11

Chord Classification of an Audio Signal using Artificial Neural Network

AP Music Theory Syllabus

CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER 9...

Student Performance Q&A: 2001 AP Music Theory Free-Response Questions

The purpose of this essay is to impart a basic vocabulary that you and your fellow

BLUE VALLEY DISTRICT CURRICULUM & INSTRUCTION Music 9-12/Honors Music Theory

Automatic music transcription

2ca - Compose and perform melodic songs. 2cd Create accompaniments for tunes 2ce - Use drones as accompaniments.

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Transcription:

Analyzer Documentation Prepared by: Tristan Jehan, CSO David DesRoches, Lead Audio Engineer September 2, 2011 Analyzer Version: 3.08 The Echo Nest Corporation 48 Grove St. Suite 206, Somerville, MA 02144 (617) 628-0233 tristan@echonest.com http://the.echonest.com

Introduction Analyze is a music audio analysis tool available as a free public web API (visit developer.echonest.com) and a standalone command-line binary program for commercial partners (contact biz@echonest.com). The program takes a digital audio file from disk (e.g. mp3, m4a, wav, aif, mov, mpeg, flv), or audio data piped in on the command line. It generates a JSON-formatted text file that describes the track s structure and musical content, including rhythm, pitch, and timbre. All information is precise to the microsecond (audio sample). Analyze is the world s only music listening API. It uses proprietary machine listening techniques to simulate how people perceive music. It incorporates principles of psychoacoustics, music perception, and adaptive learning to model both the physical and cognitive processes of human listening. The output of analyze contains a complete description of all musical events, structures, and global attributes such as key, loudness, time signature, tempo, beats, sections, harmony. It allows developers to create applications related to the way people hear and interact with music. The output data allows developers to 1) interpret: understand, describe, and represent music. Applications include music similarity, playlisting, music visualizers, and analytics. 2) synchronize: align music with other sounds, video, text, and other media. Applications include automatic soundtrack creation and music video games. 3) manipulate: remix, mashup, or process music by transforming its content. An example is the automatic ringtone application mashtone for the iphone. Output Data meta data: analyze, compute, and track information. track data time signature: an estimated overall time signature of a track. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). key: the estimated overall key of a track. The key identifies the tonic triad, the chord, major or minor, which represents the final point of rest of a piece. mode: indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. tempo: the overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. loudness: the overall loudness of a track in decibels (db). Loudness values in the Analyzer are averaged across an entire track and are useful for comparing relative loudness of segments and tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). duration: the duration of a track in seconds as precisely computed by the audio decoder. end of fade in: the end of the fade-in introduction to a track in seconds. start of fade out: the start of the fade out at the end of a track in seconds. codestring, echoprintstring: these represent two different audio fingerprints computed on the audio and are used by other Echo Nest services for song identification. For more information on Echoprint, see http://echoprint.me. timbre, pitch, and loudness are described in detail as part of the segments interpretation below. sequenced data: the Analyzer breaks down the audio into musically relevant elements that occur sequenced in time. From smallest to largest those include: Analyze v3.08 Documentation 1

segments: a set of sound entities (typically under a second) each relatively uniform in timbre and harmony. Segments are characterized by their perceptual onsets and duration in seconds, loudness (db), pitch and timbral content. loudness_start: indicates the loudness level at the start of the segment loudness_max_time: offset within the segment of the point of maximum loudness loudness_max: peak loudness value within the segment tatums: list of tatum markers, in seconds. Tatums represent the lowest regular pulse train that a listener intuitively infers from the timing of perceived musical events (segments). beats: list of beat markers, in seconds. A beat is the basic time unit of a piece of music; for example, each tick of a metronome. Beats are typically multiples of tatums. bars: list of bar markers, in seconds. A bar (or measure) is a segment of time defined as a given number of beats. Bar offsets also indicate downbeats, the first beat of the measure. sections: a set of section markers, in seconds. Sections are defined by large variations in rhythm or timbre, e.g. chorus, verse, bridge, guitar solo, etc. JSON Schema Example { meta :! { analyzer_version : 3.08b, "detailed_status":"ok", "filename":"/users/jim/ Desktop/file.mp3", "artist":"michael Jackson", "album":"thriller", "title":"billie Jean", "genre":"rock", "bitrate":192, "sample_rate":44100, "seconds":294, "status_code":0, "timestamp":1279120425, "analysis_time": 3.83081! }, track :! { "num_samples":6486072, "duration":294.15293, "sample_md5":"0a84b8523c00b3c8c42b2a0eaabc9bcd", "decoder":"mpg123", "offset_seconds":0, "window_seconds":0, "analysis_sample_rate":22050, "analysis_channels":1, "end_of_fade_in":0.87624, "start_of_fade_out": 282.38948, "loudness":-7.078, "tempo":117.152, "tempo_confidence":0.848, "time_signature":4, "time_signature_confidence":0.42, "key":6, "key_confidence":0.019, "mode":1, "mode_confidence":0.416, "codestring":"ejwdk8u7m4rz9pej...tbtsnk8u7m4rz980uf", "code_version":3.15, "echoprintstring":"ejzfnquyrtquzbsenjamof5a_5t...pdf6ef 7eH_D9MWE8p", "echoprint_version": 4.12, "synchstring": "ejxlwwmwjcsou0ociwz2-1- sssrdzp...nuf0aeyz4=", "synch_version": 1! }, bars :! [{"start":1.49356, "duration":2.07688, "confidence":0.037},...], "beats":! [{"start":0.42759, "duration":0.53730, "confidence":0.936},...], "tatums":! [{"start":0.16563, "duration":0.26196, "confidence":0.845},...], "sections": Analyze v3.08 Documentation 2

}! [{"start":0.00000, "duration":8.11340, "confidence":1.000},...], "segments":! [{ "start":0.00000, "duration":0.31887, "confidence":1.000, "loudness_start":-60.000, "loudness_max_time":0.10242, "loudness_max":-16.511, "pitches":[0.370, 0.067, 0.055, 0.073, 0.108, 0.082, 0.123, 0.180, 0.327, 1.000, 0.178, 0.234], "timbre":[24.736, 110.034, 57.822, -171.580, 92.572, 230.158, 48.856, 10.804, 1.371, 41.446, -66.896, 11.207]! },...] Analyze v3.08 Documentation 3

Interpretation!"#$%&'()*+,'$-#)+./0+/")11'2')345 ;./0+2")11'2')345 = >? :8 :;.'42@+A4#)3B4@ C'42@)5+6/D:E+FFF+E+GD:;7 ; = >? :8 :; J"H*3)55+/H#K) ;8 &"H*3)55+6*G7 8!;8!=8 &"H*3)55+D+!<FI:<!>8 5C))*+65)2"3*57 = < ; : 4)$C"+D+:;:F;>+68F>8?7 4'$)A'B3%4H#)+D+=P=+6:7 Q)R+D+S+$%T"#+68FU>>+V+:7 tatum,%4h$+l+g)%4+l+m"n3-)%4+j"2%4'"3e+o%4)e+022)&)#%4'"3e+%3*+a)24'"35 section confidence 8 bar beat Plot of the JSON data for a 30-second excerpt of around the world by Daft Punk. Analyze v3.08 Documentation 4

Rhythm Beats are subdivisions of bars. Tatums are subdivisions of beats. That is, bars always align with a beat and ditto tatums. Note that a low confidence does not necessarily mean the value is inaccurate. Exceptionally, a confidence of -1 indicates no value: the corresponding element must be discarded. A track may result with no bar, no beat, and/or no tatum if no periodicity was detected. The time signature ranges from 3 to 7 indicating time signatures of 3/4, to 7/4. A value of -1 may indicate no time signature, while a value of 1 indicates a rather complex or changing time signature. Pitch The key is a track-level attribute ranging from 0 to 11 and corresponding to one of the 12 keys: C, C#, D, etc. up to B. If no key was detected, the value is -1. The mode is equal to 0 or 1 for minor or major and may be -1 in case of no result. Note that the major key (e.g. C major) could more likely be confused with the minor key at 3 semitones lower (e.g. A minor) as both keys carry the same pitches. Harmonic details are given in segments below. Segments Beyond timing information (start, duration), segments include loudness, pitch, and timbre features. loudness information (i.e. attack, decay) is given by three data points, including db value at onset (loudness_start), db value at peak (loudness_max), and segment-relative offset for the peak loudness (loudness_max_time). The db value at onset is equivalent to the db value at offset for the preceding segment. The last segment specifies a db value at offset (loudness_end) as well. pitch content is given by a chroma vector, corresponding to the 12 pitch classes C, C#, D to B, with values ranging from 0 to 1 that describe the relative dominance of every pitch in the chromatic scale. For example a C Major chord would likely be represented by large values of C, E and G (i.e. classes 0, 4, and 7). Vectors are normalized to 1 by their strongest dimension, therefore noisy sounds are likely represented by values that are all close to 1, while pure tones are described by one value at 1 (the pitch) and others near 0. timbre is the quality of a musical note or sound that distinguishes different types of musical instruments, or voices. It is a complex notion also referred to as sound color, texture, or tone quality, and is derived from the shape of a segment s spectro-temporal surface, independently of pitch and loudness. The Echo Nest Analyzer s timbre feature is a vector that includes 12 unbounded values roughly centered around 0. Those values are high level abstractions of the spectral surface, ordered by degree of importance. For completeness however, the first dimension represents the average loudness of the segment; second emphasizes brightness; third is more closely correlated to the flatness of a sound; fourth to sounds with a stronger attack; etc. See an image below representing the 12 basis functions (i.e. template segments). The actual timbre of the segment is best described as a linear combination of these 12 basis functions weighted by the coefficient values: timbre = c1 x b1 + c2 x b2 +... + c12 x b12, where c1 to c12 represent the 12 coefficients and b1 to b12 the 12 basis functions as displayed below. Timbre vectors are best used in comparison with each other. 12 basis functions for the timbre vector: x = time, y = frequency, z = amplitude Confidence Values Many elements at the track and lower levels of analysis include confidence values, a floating-point number ranging from 0.0 to 1.0. Confidence indicates the reliability of its corresponding attribute. Elements carrying a small confidence value should be considered speculative. There may not be sufficient data in the audio to compute the element with high certainty. Analyze v3.08 Documentation 5

Synchstring With Analyzer v3.08, a new data string is introduced. It works with a simple synchronization algorithm to be implemented on the client side, which generates offset values in numbers of samples for 3 locations in the decoded waveform, the beginning, the middle, and the end. These offsets allow the client application to detect decoding errors (when offsets mismatch). They provide for synching with sample accuracy, the JSON timing data with the waveform, regardless of which mp3 decoder was used on the client side (quicktime, ffmpeg, mpg123, etc.) Since every decoder makes its own signal-dependent offset and error correction, sample accuracy isn t manageable by other means, such as decoder type and version tracking. For implementation examples of the synchronization algorithm, please go to the github repository at http://github.com/echonest/synchdata. Analyze v3.08 Documentation 6