Music Radar: A Web-based Query by Humming System

Similar documents
Melody Retrieval On The Web

TANSEN: A QUERY-BY-HUMMING BASED MUSIC RETRIEVAL SYSTEM. M. Anand Raju, Bharat Sundaram* and Preeti Rao

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Music Information Retrieval Using Audio Input

A Music Retrieval System Using Melody and Lyric

CSC475 Music Information Retrieval

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Music Database Retrieval Based on Spectral Similarity

Robert Alexandru Dobre, Cristian Negrescu

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Music Structure Analysis

Introductions to Music Information Retrieval

Comparison of Dictionary-Based Approaches to Automatic Repeating Melody Extraction

A Query-by-singing Technique for Retrieving Polyphonic Objects of Popular Music

Melody transcription for interactive applications

Music Segmentation Using Markov Chain Methods

Query By Humming: Finding Songs in a Polyphonic Database

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Automatic music transcription

Repeating Pattern Extraction Technique(REPET);A method for music/voice separation.

A LYRICS-MATCHING QBH SYSTEM FOR INTER- ACTIVE ENVIRONMENTS

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 1. Note Segmentation and Quantization for Music Information Retrieval

Automatic Rhythmic Notation from Single Voice Audio Sources

Statistical Modeling and Retrieval of Polyphonic Music

Tempo and Beat Analysis

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

Voice & Music Pattern Extraction: A Review

NEW QUERY-BY-HUMMING MUSIC RETRIEVAL SYSTEM CONCEPTION AND EVALUATION BASED ON A QUERY NATURE STUDY

Singing Pitch Extraction and Singing Voice Separation

An Audio Front End for Query-by-Humming Systems

Creating Data Resources for Designing User-centric Frontends for Query by Humming Systems

A repetition-based framework for lyric alignment in popular songs

A prototype system for rule-based expressive modifications of audio recordings

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

Efficient Vocal Melody Extraction from Polyphonic Music Signals

The MAMI Query-By-Voice Experiment Collecting and annotating vocal queries for music information retrieval

Automatic Construction of Synthetic Musical Instruments and Performers

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music

HUMMING METHOD FOR CONTENT-BASED MUSIC INFORMATION RETRIEVAL

Tune Retrieval in the Multimedia Library

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Transcription of the Singing Melody in Polyphonic Music

Music Representations

Algorithms for melody search and transcription. Antti Laaksonen

Topic 4. Single Pitch Detection

THE importance of music content analysis for musical

A Note Based Query By Humming System using Convolutional Neural Network

A System for Acoustic Chord Transcription and Key Extraction from Audio Using Hidden Markov models Trained on Synthesized Audio

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

Automatic Piano Music Transcription

Melodic Outline Extraction Method for Non-note-level Melody Editing

Interacting with a Virtual Conductor

The song remains the same: identifying versions of the same piece using tonal descriptors

Music Alignment and Applications. Introduction

Subjective Similarity of Music: Data Collection for Individuality Analysis

Pattern Based Melody Matching Approach to Music Information Retrieval

Semantic Segmentation and Summarization of Music

Pattern Recognition in Music

A probabilistic framework for audio-based tonal key and chord recognition

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

Singer Recognition and Modeling Singer Error

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

IMPROVED MELODIC SEQUENCE MATCHING FOR QUERY BASED SEARCHING IN INDIAN CLASSICAL MUSIC

Toward Evaluation Techniques for Music Similarity

Figure 1: Feature Vector Sequence Generator block diagram.

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Music structure information is

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

A Pattern Recognition Approach for Melody Track Selection in MIDI Files

Audio Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Evaluation of Melody Similarity Measures

Acoustic Measurements Using Common Computer Accessories: Do Try This at Home. Dale H. Litwhiler, Terrance D. Lovell

Controlling Musical Tempo from Dance Movement in Real-Time: A Possible Approach

2. AN INTROSPECTION OF THE MORPHING PROCESS

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

The dangers of parsimony in query-by-humming applications

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

Audio. Meinard Müller. Beethoven, Bach, and Billions of Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

A wavelet-based approach to the discovery of themes and sections in monophonic melodies Velarde, Gissel; Meredith, David

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Transcription An Historical Overview

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Content-based Music Structure Analysis with Applications to Music Semantics Understanding

Hidden Markov Model based dance recognition

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

Creating data resources for designing usercentric frontends for query-by-humming systems

Audio Feature Extraction for Corpus Analysis

AN ON-THE-FLY MANDARIN SINGING VOICE SYNTHESIS SYSTEM

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

Audio-Based Video Editing with Two-Channel Microphone

Transcription:

Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh, zhoucm}@purdue.edu Abstract. Query by humming (QBH) means to search a piece of music by singing or humming. Given melodies hummed by the users, query by humming systems will return an ordered list of songs according to the similarity between hummings and target songs. Although there are many searching techniques for query by humming, our project is building a web-based query-by-humming system, which can find a piece of music in the digital music repository based on a few hummed notes, using a melody representation that combines with the pitch tracking. We also evaluate the performance of our system by using different data set. We evaluate the performance of our system on a public corpus given by MIREX. The exact match accuracy is 43.44%. And if the criterion is relaxed to top 10 ranking, the accuracy is increased to 75.63%. 1 Introduction Our project is to build a query-by-humming system (called the QBH system), which can find a piece of music in the digital music repository based on a few hummed notes. When the user does not know the title or any other text information about the music, he is still able to search for music by humming the melody. Query-byhumming is a much friendlier interface than existing systems for music searching on the Internet. The system we built is a web-based system. 1.1 QBH System Besides the above application value, the query-by-humming system is also an interesting topic from a scientific point of view. Identifying a musical work from a melodic fragment is a task that most people are able to accomplish with relative ease. However, how people achieve this is still unclear, i.e., how do people extract melody from a complex music piece and convert it to a representation that could be memorized and retrieved easily and accurately with tolerance of some transpositions? Although this whole question is beyond the scope of our project, we will build a system that performs like a human: it can extract melodies from music; it can convert

the melodies into an efficient representation and store them in its memory; when a user asks for a piece of music by humming the melody, it can first hear the query and then search in its memory for the piece that it thinks most similar to the query. The main features of this are: A melody representation, which combines both pitch and rhythmic information. New approximate melody matching algorithms based on the representation. A set of automatic transcription techniques customized for the query-byhumming system to obtain both pitch and rhythmic information. A handy tool to build a melody database from MIDI format. A deliverable query-by-humming system including both the server application and the browser application. 1.2 MIDI File Format The MIDI File is a file format used to store MIDI data (plus some other kinds of data typically needed by a sequencer). This format stores the standard MIDI messages (ie, status bytes with appropriate data bytes) plus a time-stamp for each message (ie, a series of bytes that represent how many clock pulses to wait before "playing" the event). The format also allows saving information about tempo, time and key signatures, the names of tracks and patterns, and other information typically needed by a sequencer. One MIDI file can store information for numerous patterns and tracks so that any sequencer can preserve these structures when loading the file. 1.3 Related Work Other researchers are also investigating the use of pitch tracking and dynamic programming matching [5] methods for music retrieval. Brown[7] presented a way of using autocorrelation to determine the meter of music scores. Chai[8] attempted to extract several perceptual features from the MIDI files. Most of the research in preexisting query by humming systems uses pitch contour to match similar melodies [9, 10, 11, 12]. 2 System Architecture Here is the overview of the system architecture.

Note Segmentation MIDI File Pitch Tracking Melody Extraction Query Construction Representation -1 2 1 2 1 2-1 0-1 2 1 2 1 2-1 0 Query Request DP Match Report Generator JSON Object Browser (Javascript) Server (PHP) Figure 1 System Architecture 2.1 Browser Side Architecture Note Segmentation: The purpose of note segmentation is to identify each note s onset and offset boundaries within the acoustic signal. In order to allow segmentation based on the signal s amplitude, we ask the user to sing using the syllables like da, thus separating notes by the short drop in amplitude caused by the stop consonant. Pitch Tracking: This component is to estimate the pitch or fundamental frequency of a periodic or virtually periodic signal, usually a digital recording of speech or a musical note or tone. This can be done in the time domain or the frequency domain or both the two domains. Query Construction: The note information will be transferred to a string representation and send to the server as a HTTP GET request. http://localhost/query.php?seq= -121012-2 Report Generator: Parse the JSON object returned by the server side and generate a rank list report to the user.

Figure 2 Web-based QBH User Interface 2.2 Server Side Architecture Music Database: This includes the source data. The source data are the original music corpora, from which we extract melodies and generate the data representation. The source data are in MIDI files currently (but it can be extended to handle other music format). Melody Extractor: This extracts the melody information from MIDI file and transfer to a string sequence to represent the note information. DB Matcher: This receives the query from browser side, uses dynamic programming to match it with the melodies in the melody description objects, and returns a rank list for matching songs to the user as a JSON object.

3 Implementation 3.1 Note Segmentation The purpose of note segmentation is to identify each note in acoustic signal in order to help pitch tracking. A note is a pitched sound, which is an atom element of most Western music. After identify a note, we can easily compute its pitch information, because each note should have relatively constant frequency. Also note segmentation can filter out most of the unvoiced period. Figure 3 Waveform We can identify the onset and offset boundary of a note by the amplitude of sound. If we assumed people hum the song with syllables like da or they sing the song with words, usually there will be a short drop in amplitude between every two notes. First, we convert the amplitude from waveform to amplitude through computing the spectrogram. 512 samples window length with an overlap of 256 samples are used for 41100Hz sample rate. By summing up the absolute values of spectrums in each window, we can get a sequence of amplitude A.

Figure 4 Spectrogram Second, we identify the onset and offset boundary for each note based on the amplitude. The basic idea is to set a threshold and the intersections of the amplitude and the threshold are the boundaries. Using a fixed threshold for the query will lead to poor segmentation, therefore, we use dynamic thresholds. We first define a global threshold as a = 0.3 A(w). Second, we divided the sequence of amplitude into frames of length 80ms. We define the local threshold for the i frame, F, to be a = max a, 0.7 F (w). Then we scan the amplitude sequence A. If a note is not onset and the current amplitude is greater than the local threshold of the current frame, we set the current position to be the onset boundary of a new note. If the onset boundary of current note is 100ms away from current position and the current amplitude drops below the local threshold of the current frame, we set the current position to be the offset boundary of the current note.

Figure 5 Amplitude and Notes The global, local thresholds and minimum duration of a note is chosen by heuristic. Some note may be added or dropped occasionally by our algorithms. We will discuss their effects in evaluation section. 3.2 Pitch Tracking The primary goal of pitch tracking is to find the fundamental frequency of each note. Pitch is a subjective perception based on the frequency of acoustic signal. The fundamental frequency, f!, is the lowest frequency of a periodic waveform x. We can use it as a metric for pitch. The algorithm we used for pitch tracking is autocorrelation. Autocorrelation is the most popular time-domain method for pitch tracking, which computes the crosscorrelation of signals. First, we divided the waveform into windows with a length of 30ms and with an overlap of 20ms. For each window, we compute the nonnormalized autocorrelation for a max lag of 50Hz (882 samples for 44100Hz sample rate). The equation for non-normalized autocorrelation is N-d r N (d)= n=1 x(n)x(n+d), where N is the frame size, d is the positive lag, x(n) is the value of the n sample in the window. The fundamental frequency is selected as the lag where maximum autocorrelation is reached between 50Hz and 1000Hz, which is the frequency range of human sound. In the second step, we convert the fundamental frequency into pitch combined with information from note segmentation. We first round the fundamental frequency into

note number used in midi files according to the equation: m = round 69 + 12 1 log 2 0. Each number represents a semitone. Then we choose the mode of note 33! numbers during a note period as the pitch. Figure 6 Note Number and Pitch 3.3 Melody Matching We compute minimum edit distance of pitch contours for melody matching. The underlying principle for this algorithm is that although most of the people are good at capturing the relative change in tunes instead of accuracy in tunes. We first use pitch contour to represent the pitch information from either pitch tracking algorithms or midi files. Second, we will use dynamic programming to compute the minimum edit distance between two pitch contours. The less the distance, the more similar the two pieces of music are. The pitch contour is a sequence of relative change in pitch. In our method only 3-level contour information is used, that is, we use 0, ±1 and ±2 to represent the changes. The pitch contour is computed for every note except the firs note as following. If a note has the same note number as the previous note, we use 0 to represent it. If a note is only one semitone higher (lower) than the previous note, we use +1 ( 1) to represent it. If a note is higher (lower) than the previous more than one semitone, we use +2 ( 2) to represent it. Therefore, a sequence of 53, 51, 45, 47, 50 will be represented as -2, -2, 2, 2. The second step is to compute the edit-distance of two pitch contours. We define that the pitch contour from the query is the pattern, 6, and the pitch contour from one midi file is the target, 7. The minimum edit-distance of matching the first 8 numbers in 6 and the first 9 numbers in 7 is : ;,<. Then the minimum edit-distance can be computed recursively:

L L L : ;>,< +?@A7 ;BCDEF (6 ; ) : ;,< = = : ;,<> +?@A7 GDHDFD (7 < ) : ;>,<> +?@A7 EDIHJKD (7 <, 6 ; ) By heuristic, we define the costs to be:?@a7 ;BCDEF (6 ; ) = MNA(6 ; ) + 1 =?@A7 GDHDFD (7 < ) = MNAO7 < P + 1?@A7 EDIHJKD (7 <, 6 ; ) = MNAO6 ; 7 < P Because the pattern usually matches a part of the target, we should allow inserting numbers at the beginning and deleting numbers at the end with no cost. Therefore, we define the initial condition to be: 0, if 8 = 0 : ;,< = Q 8, if 9 = 0 L And we modify the update function for when 8 = 6 : : ;>,< +?@A7 ;BCDEF (6 ; ) : ;,< = = : ;,<> : ;>,<> +?@A7 EDIHJKD (7 <, 6 ; ) 4 Evaluation Our system is evaluated on a public corpus, MIR-QBSH, from Music Information Retrieval Evaluation exchange (MIREX) campaign. The corpus contains 48 midi files as ground-truth and 4431 queries created from 2003 to 2009 by 195 different people. Different from general information retrieval systems, the number of relevant retrieved items is either 0 or 1 for QBH system. And since our system returns the rank of midi files in the database based on the Edit distance to the query, common measures like precision and recall, are appropriate to evaluate QBH system. Therefore, we choose a rank-based method to present our experimental results, which is called Top-k-Accuracy. We define a query to be successful if the relevant midi file is within top k items in the returned rank list. Top-k-Accuracy = n R N S n R is the number of successful queries for k. N S is the total number of queries. Top- 1-Accuracy means the proportion of exact matches. A closer to 1 value of Top-k- Accuracy indicates a better results. We also define the hardness of songs and proficiency of singers.

Song hardness = Singer ProZiciency = N SX R r X N S R X r X ] ;< is the rank of 8th singer on 9th song. ^_; is the number of singers that have a query on 9th song. ^_; is the number of songs that are queried by 8th singer. A smaller hardness means the song is more difficult to perform, while a larger proficiency suggests the singer might be more proficient. Table 1 is the Top-k- Accuracy for different k values K TOP-K-ACCURACY 1 0.4344 5 0.6276 10 0.7563 Table 1 Top-K-Accuracy for Different K Values Figure 7 and 8 show the song hardness and singer proficiency for the corpus we used. According to our observation of the results. We found that there are a few factors that may affect our matching results. Out-of-tune singing. If the singer cannot catch the change of tune, our system can hardly give a good matching result. Low voice quality, especially when the voice of the singer is overwhelmed by the background noise. Short query. The system needs an enough long query to match the unique sequence of a song. But the exact minimum duration of query largely depends on the song and the quality of the query. The average of the duration is around 6~8 seconds.

0.5 Song Hardness 0.45 0.4 0.35 0.3 Hardness 0.25 0.2 0.15 0.1 0.05 0 0 5 10 15 20 25 30 35 40 45 50 Songs Figure 7 Hardness of All Songs in Corpus 1 Singer Proficiency Proficiency 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 120 140 160 180 200 Singers Figure 8 Proficiency of All Singers 5 Conclusion In this project, we build a web-based query-by-humming system, use pitch tracking and dynamic programming matching method. For future work, we need to test our system on different classes of music (i.e. pop music, country music). We also can try

other matching methods like DTW (Dynamic Time Warping), HMM (Hidden Markov Model) and compare the performance. References 1.Asif Ghias and Jonathan Logan and David Chamberlin and Brian C. Smith, Query by humming: musical information retrieval in an audio database, In ACM Multimedia, 1995. 2. Jyh-Shing Roger Jang, MIR Corpora: http://mirlab.org/dataset/public/mir-qbsh-corpus.rar 3. MATLAB and MIDI: http://www.kenschutte.com/midi 4. Pitch Detection: http://note.sonots.com/scisoftware/pitch.html 5. Wei Chai, Melody Retrieval On The WebMaster Thesis at the Massachusetts Institute of Technology, M.I.T Media Laboratory, Fall 2000 6. MIDI: http://en.wikipedia.org/wiki/midi 7. Brown, Judith C. Determination of the meter of musical scores by autocorrelation. J. Acoust. Soc. Am. 94:4, Oct. 1993. 8. Chai, Wei and Vercoe, Barry. Using user models in music information retrieval systems. Proc. International Symposium on Music Information Retrieval, Oct. 2000. 9. A. Ghias, J. Logan, D. Chamberlin, and B. C. Smith., Query by humming: Musical information retrieval in an audio database. In ACM Multimedia 1995, pages 231 1995. 10, D. Q. Goldin and P. C. Kanellakis. On similarity queries for time-series data: Constraint specification and implementation. In Proceedings of the 1 st International Conference on Principles and Practice of Constraint Programming (CP'95), 1995. 11. J. M. Hellerstein, J. F. Naughton, and A. Pfeffer. Generalized search trees for database systems. InU. Dayal, P. M. D. Gray, and S. Nishio, editors, Proc. 21st Int. Conf. Very Large Data Bases, VLDB, pages 562-573. Morgan Kaufmann, 11-15 1995. 12. J.-S. R. Jang and H.-R. Lee. Hierarchical filtering method for content-based music retrieval via acoustic input. In Proceedings of the ninth ACM international conference on Multimedia, pages 401-410. ACM Press,2001.