Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology

Similar documents
Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

A SCORE-INFORMED PIANO TUTORING SYSTEM WITH MISTAKE DETECTION AND SCORE SIMPLIFICATION

Robert Alexandru Dobre, Cristian Negrescu

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

ON THE USE OF PERCEPTUAL PROPERTIES FOR MELODY ESTIMATION

Topic 10. Multi-pitch Analysis

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

A Shift-Invariant Latent Variable Model for Automatic Music Transcription

Transcription of the Singing Melody in Polyphonic Music

Subjective Similarity of Music: Data Collection for Individuality Analysis

HarmonyMixer: Mixing the Character of Chords among Polyphonic Audio

Automatic music transcription

Automatic Rhythmic Notation from Single Voice Audio Sources

Query By Humming: Finding Songs in a Polyphonic Database

AN EFFICIENT TEMPORALLY-CONSTRAINED PROBABILISTIC MODEL FOR MULTIPLE-INSTRUMENT MUSIC TRANSCRIPTION

SINGING VOICE MELODY TRANSCRIPTION USING DEEP NEURAL NETWORKS

THE importance of music content analysis for musical

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

City, University of London Institutional Repository

Efficient Vocal Melody Extraction from Polyphonic Music Signals

POLYPHONIC TRANSCRIPTION BASED ON TEMPORAL EVOLUTION OF SPECTRAL SIMILARITY OF GAUSSIAN MIXTURE MODELS

Popular Song Summarization Using Chorus Section Detection from Audio Signal

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Onset Detection and Music Transcription for the Irish Tin Whistle

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

EVALUATION OF MULTIPLE-F0 ESTIMATION AND TRACKING SYSTEMS

A NOVEL CEPSTRAL REPRESENTATION FOR TIMBRE MODELING OF SOUND SOURCES IN POLYPHONIC MIXTURES

Semi-supervised Musical Instrument Recognition

Chord Classification of an Audio Signal using Artificial Neural Network

HUMANS have a remarkable ability to recognize objects

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Introductions to Music Information Retrieval

EVALUATING AUTOMATIC POLYPHONIC MUSIC TRANSCRIPTION

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

A MID-LEVEL REPRESENTATION FOR CAPTURING DOMINANT TEMPO AND PULSE INFORMATION IN MUSIC RECORDINGS

Appendix A Types of Recorded Chords

SYNTHESIZED POLYPHONIC MUSIC DATABASE WITH VERIFIABLE GROUND TRUTH FOR MULTIPLE F0 ESTIMATION

MUSIC TRANSCRIPTION USING INSTRUMENT MODEL

Automatic Transcription of Polyphonic Music Exploiting Temporal Evolution

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Musical Instrument Recognizer Instrogram and Its Application to Music Retrieval based on Instrumentation Similarity

MELODY EXTRACTION FROM POLYPHONIC AUDIO OF WESTERN OPERA: A METHOD BASED ON DETECTION OF THE SINGER S FORMANT

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

638 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

Music Information Retrieval with Temporal Features and Timbre

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING

Music Structure Analysis

SINGING VOICE ANALYSIS AND EDITING BASED ON MUTUALLY DEPENDENT F0 ESTIMATION AND SOURCE SEPARATION

Music Radar: A Web-based Query by Humming System

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

IMPROVING GENRE CLASSIFICATION BY COMBINATION OF AUDIO AND SYMBOLIC DESCRIPTORS USING A TRANSCRIPTION SYSTEM

Multipitch estimation by joint modeling of harmonic and transient sounds

Subjective evaluation of common singing skills using the rank ordering method

DEEP SALIENCE REPRESENTATIONS FOR F 0 ESTIMATION IN POLYPHONIC MUSIC

Violin Timbre Space Features

AUTOM AT I C DRUM SOUND DE SCRI PT I ON FOR RE AL - WORL D M USI C USING TEMPLATE ADAPTATION AND MATCHING METHODS

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music

A Two-Stage Approach to Note-Level Transcription of a Specific Piano

LISTENERS respond to a wealth of information in music

AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS

AUTOMATIC MUSIC TRANSCRIPTION WITH CONVOLUTIONAL NEURAL NETWORKS USING INTUITIVE FILTER SHAPES. A Thesis. presented to

Tempo and Beat Analysis

Algorithms for melody search and transcription. Antti Laaksonen

/$ IEEE

MELODY EXTRACTION BASED ON HARMONIC CODED STRUCTURE

pitch estimation and instrument identification by joint modeling of sustained and attack sounds.

Detecting Musical Key with Supervised Learning

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

TIMBRE-CONSTRAINED RECURSIVE TIME-VARYING ANALYSIS FOR MUSICAL NOTE SEPARATION

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal

Automatic Piano Music Transcription

DOWNBEAT TRACKING WITH MULTIPLE FEATURES AND DEEP NEURAL NETWORKS

MUSICAL INSTRUMENT RECOGNITION USING BIOLOGICALLY INSPIRED FILTERING OF TEMPORAL DICTIONARY ATOMS

Evaluation and Combination of Pitch Estimation Methods for Melody Extraction in Symphonic Classical Music

Rapidly Learning Musical Beats in the Presence of Environmental and Robot Ego Noise

Supervised Learning in Genre Classification

Addressing user satisfaction in melody extraction

Chroma-based Predominant Melody and Bass Line Extraction from Music Audio Signals

CSC475 Music Information Retrieval

A REAL-TIME SIGNAL PROCESSING FRAMEWORK OF MUSICAL EXPRESSIVE FEATURE EXTRACTION USING MATLAB

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15

A probabilistic framework for audio-based tonal key and chord recognition

JOINT BEAT AND DOWNBEAT TRACKING WITH RECURRENT NEURAL NETWORKS

CULTIVATING VOCAL ACTIVITY DETECTION FOR MUSIC AUDIO SIGNALS IN A CIRCULATION-TYPE CROWDSOURCING ECOSYSTEM

Categorization of ICMR Using Feature Extraction Strategy And MIR With Ensemble Learning

CLASSIFICATION OF MUSICAL METRE WITH AUTOCORRELATION AND DISCRIMINANT FUNCTIONS

Music Structure Analysis

Automatic Transcription of Polyphonic Vocal Music

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval

Statistical Modeling and Retrieval of Polyphonic Music

Computational Modelling of Harmony

Time Variability-Based Hierarchic Recognition of Multiple Musical Instruments in Recordings

Automatic music transcription

Transcription:

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology 26.01.2015

Multipitch estimation obtains frequencies of sounds from a polyphonic audio signal Number of sources can be known or not This problem belongs to the Music Information Retrieval field.

How to retrieve separate sounds from one, mixed sound signal? Amplitude Fig. 1. An example sound signal Time [s]

Fig. 2. An example of symbolic notation of sound signal

Amplitude Time [s] Fig. 3. A sinusoid, f=440 Hz

Fig. 4. A spectrum of the signal from Fig. 3.

Amplitude Fig. 5. A signal containing two sinusoids, f 1 =440 Hz and f 2 =784 Hz Time [s]

Fig. 6. A spectrum of the signal from Fig. 5.

Amplitude Fig. 7. A signal of the A4 (f 0 = 440 Hz) played on the flute Time [s]

Fig. 8. A spectrum of the signal from Fig. 7.

Amplitude Fig. 9. A signal of A4 and G5 (f 1 = 440 Hz and f 2 =784 Hz) played on the flute. Time [s]

Fig. 10. A spectrum of the signal from Fig. 9.

Fig. 11. A spectrum of signal from Fig. 9. (log scale)

Constant-Q Transform Cepstrum Preprocessing Preprocessing SI-PLCA SI-PLCA Normalization Normalization Fig. 12. Structure of the solution The Judge

Constant-Q Transform non-linear frequency transform that gives more information on lower frequencies than higher much more reasonable choice for music processing than DFT Cepstrum shows the rate of changes in the regular spectrum. Typically used in speech processing (especially as basis of MFCC) and in single-f 0 approaches

After obtaining the representations, additional preprocessing (pre- referring to the fact that it is done before the representation analysis) is done Preprocessing includes: Removing components with small values Smoothing the representations Calculating the salience

Fig. 13. The initial, unprocessed CQT Fig. 14. CQT after removing small components

Fig. 15. CQT after removing small components Fig. 16. CQT after smoothing

After receiving preprocessed sound representations, they are analyzed using the Shift-invariant Probabilistic Latent Component Analysis This method treats spectrogram (or any other of similar representation, such as time-lag) as distribution of time and energy Therefore it can be decomposed to kernel and impulse distributions

Pic. 17. An example of decomposing spectrogram into kernel and impulse distributions [10].

While regular methods of F 0 -estimation would end here, in this case, now happens very important part choosing the appropriate solution from candidates reported by methods First, however, candidates must be grouped and normalized

Each method chooses a set of candidates, where each candidate c has three attributes: Frequency f, Power p Count c Candidates power must be normalized, since energy of frequency components in CQT is different than energy of a component in a cepstrum

The judge is responsible for: Grouping the candidates returned by all the methods using chosen criteria Sorting the candidates and choosing the best of them as a solution of the algorithm

The accuracy of the intervals (2-sound chords) reached 87%. In case of three-sound chords accuracy reached 81.5% and for four-sound chords 75.2%. The interval accuracy before applying the judge has reached 95.2%, however (93.6% for 3-chords and 88.9% for 4-chords)

Interval (semitones) Accuracy (%) 0 88.83 1 87.82 2 85.77 3 87.62 4 89.11 5 88.24 6 84.62 7 83.00 8 85.11 9 89.67 10 90.85 11 83.75 Table 1. Percentage of correctly detected intervals by the type of interval (intervals higher than an octave have been reduced to their equivalents within an octave)

Applying multiple methods gives a very good and predictable results even for more complex polyphony The judge role is very important, as the current version, while giving overall good accuracy, might still be improved

[1] K. Dressler: "Multiple fundamental frequency extraction for mirex 2012". In: The 13th International Conference on Music Information Retrieval (2012) [2] J. Leon, F. Beltran, J. Beltran, "A complex waveletbased fundamental frequency estimator in single-channel polyphonic signals", Proc. of the 16th Int. Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland, September 2-5, 2013 [3] E. Benetos, S. Dixon, D. Giannoulis, H. Kirchoff and A. Klapuri, "Automatic music transcription: challenges and future directions", Journal of Intelligent Information Systems, 41(3), Springer-Verlag, 407-434 (2013)

[4] M. Davy and A. Klapuri, Signal Processing Methods for Music Transcription, Springer- Verlag (2006) [5] C. Yeh: "Multiple fundamental frequency estimation of polyphonic recordings". Ph.D. thesis, Universite de Paris (2008) [6] K. Rychlicki-Kicior, B. Stasiak: "Multipitch estimation using judge-based model", Bulletin of the Polish Academy of Sciences, Technical Sciences, Vol. 62(4), 2014

[7] M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka: RWC Music Database: Music Genre Database and Musical Instrument Sound Database, Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR 2003), pp.229-230, October 2003 [8] http://www.music-ir.org/mirex/wiki/ 2014:Multiple_Fundamental_Frequency_Estimation_ %26_Tracking_Results [9] K. Rychlicki-Kicior, B. Stasiak, Metaheuristic Optimization of Multiple Fundamental Frequency Estimation, in: Man-Machine Interactions 3 (Eds.: A. Gruca, T. Czachórski, S. Kozielski), Springer, pp. 307 -- 314 (2014) [10] P. Smaragdis, B. Raj, Shift-Invariant Probabilistic Latent Component Analysis, 2007