A prototype system for rule-based expressive modifications of audio recordings

Similar documents
Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Tempo and Beat Analysis

Director Musices: The KTH Performance Rules System

Robert Alexandru Dobre, Cristian Negrescu

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Interacting with a Virtual Conductor

CONTENT-BASED MELODIC TRANSFORMATIONS OF AUDIO MATERIAL FOR A MUSIC PROCESSING APPLICATION

Automatic Construction of Synthetic Musical Instruments and Performers

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

DIGITAL AUDIO EMOTIONS - AN OVERVIEW OF COMPUTER ANALYSIS AND SYNTHESIS OF EMOTIONAL EXPRESSION IN MUSIC

Modeling and Control of Expressiveness in Music Performance

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Analysis, Synthesis, and Perception of Musical Sounds

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Music Radar: A Web-based Query by Humming System

Artificial Social Composition: A Multi-Agent System for Composing Music Performances by Emotional Communication

A Computational Model for Discriminating Music Performers

ANALYSIS-ASSISTED SOUND PROCESSING WITH AUDIOSCULPT

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

Measuring & Modeling Musical Expression

Tempo and Beat Tracking

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Topic 10. Multi-pitch Analysis

Automatic music transcription

Melody Retrieval On The Web

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH

From quantitative empirï to musical performology: Experience in performance measurements and analyses

Real-Time Control of Music Performance

Toward a Computationally-Enhanced Acoustic Grand Piano

Experiments on musical instrument separation using multiplecause

Quarterly Progress and Status Report. Musicians and nonmusicians sensitivity to differences in music performance

Quarterly Progress and Status Report. Perception of just noticeable time displacement of a tone presented in a metrical sequence at different tempos

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Importance of Note-Level Control in Automatic Music Performance

Voice & Music Pattern Extraction: A Review

Query By Humming: Finding Songs in a Polyphonic Database

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

2. AN INTROSPECTION OF THE MORPHING PROCESS

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Drum Source Separation using Percussive Feature Detection and Spectral Modulation

THE importance of music content analysis for musical

Topics in Computer Music Instrument Identification. Ioanna Karydi

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

From musical analysis to musical expression

An interdisciplinary approach to audio effect classification

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

ESP: Expression Synthesis Project

Music Representations

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

Improving Polyphonic and Poly-Instrumental Music to Score Alignment

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Onset Detection and Music Transcription for the Irish Tin Whistle

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

A COMPARISON OF PERCEPTUAL RATINGS AND COMPUTED AUDIO FEATURES

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

Digital audio and computer music. COS 116, Spring 2012 Guest lecture: Rebecca Fiebrink

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

Authors: Kasper Marklund, Anders Friberg, Sofia Dahl, KTH, Carlo Drioli, GEM, Erik Lindström, UUP Last update: November 28, 2002

MELODIC AND RHYTHMIC CONTRASTS IN EMOTIONAL SPEECH AND MUSIC

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal

MAutoPitch. Presets button. Left arrow button. Right arrow button. Randomize button. Save button. Panic button. Settings button

The DiTME Project: interdisciplinary research in music technology

BRAIN-ACTIVITY-DRIVEN REAL-TIME MUSIC EMOTIVE CONTROL

Audio-Based Video Editing with Two-Channel Microphone

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

A DISCRETE FILTER BANK APPROACH TO AUDIO TO SCORE MATCHING FOR POLYPHONIC MUSIC

RUMBATOR: A FLAMENCO RUMBA COVER VERSION GENERATOR BASED ON AUDIO PROCESSING AT NOTE-LEVEL

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Effects of acoustic degradations on cover song recognition

SINGING EXPRESSION TRANSFER FROM ONE VOICE TO ANOTHER FOR A GIVEN SONG. Sangeon Yong, Juhan Nam

Figure 2: Original and PAM modulated image. Figure 4: Original image.

Music Alignment and Applications. Introduction

Violin Timbre Space Features

Topic 4. Single Pitch Detection

Auto-Tune. Collection Editors: Navaneeth Ravindranath Tanner Songkakul Andrew Tam

Audio spectrogram representations for processing with Convolutional Neural Networks

ONLINE ACTIVITIES FOR MUSIC INFORMATION AND ACOUSTICS EDUCATION AND PSYCHOACOUSTIC DATA COLLECTION

TOWARDS EXPRESSIVE INSTRUMENT SYNTHESIS THROUGH SMOOTH FRAME-BY-FRAME RECONSTRUCTION: FROM STRING TO WOODWIND

NON-LINEAR EFFECTS MODELING FOR POLYPHONIC PIANO TRANSCRIPTION

Lecture 9 Source Separation

A METHOD OF MORPHING SPECTRAL ENVELOPES OF THE SINGING VOICE FOR USE WITH BACKING VOCALS

Automatic Rhythmic Notation from Single Voice Audio Sources

ELEC 484 Project Pitch Synchronous Overlap-Add

An Interactive Case-Based Reasoning Approach for Generating Expressive Music

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Transcription An Historical Overview

Automatic Piano Music Transcription

Multidimensional analysis of interdependence in a string quartet

A Case Based Approach to the Generation of Musical Expression

A REAL-TIME SIGNAL PROCESSING FRAMEWORK OF MUSICAL EXPRESSIVE FEATURE EXTRACTION USING MATLAB

DYNAMIC AUDITORY CUES FOR EVENT IMPORTANCE LEVEL

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music

Transcription:

International Symposium on Performance Science ISBN 0-00-000000-0 / 000-0-00-000000-0 The Author 2007, Published by the AEC All rights reserved A prototype system for rule-based expressive modifications of audio recordings Marco Fabiani 1 and Anders Friberg 1 1 Speech, Music and Hearing (TMH), Royal Institute of Technology (KTH), Sweden A prototype system is described that aims to modify a musical recording in an expressive way using a set of performance rules controlling tempo, sound level and articulation. The audio signal is aligned with an enhanced score file containing performance rules information. A timefrequency transformation is applied, and the peaks in the spectrogram, representing the harmonics of each tone, are tracked and associated with the corresponding note in the score. New values for tempo, note lengths and sound levels are computed based on rules and user decisions. The spectrogram is modified by adding, subtracting and scaling spectral peaks to change the original tone s length and sound level. For tempo variations, a time scale modification algorithm is integrated in the time domain re-synthesis process. The prototype is developed in Matlab. An intuitive GUI is provided that allows the user to choose parameters, listen and visualize the audio signals involved and perform the modifications. Experiments have been performed on monophonic and simple polyphonic recordings of classical music for piano and guitar. Keywords: automatic music performance, performance rules, musical expression, emotions, audio signal processing A music performance represents the interpretation that a musician (or a computer in our case) gives to a score. To obtain different performances, the musician often follows some principles, related to structural features of the score (e.g. musical phrases, harmony, melody ). The KTH rules system for musical performance (Friberg, Bresin and Sundberg 2006) models such principles in a quantitative way in order to control three main musical parameters: tempo, articulation and sound level. The rules are used to play back MIDI files expressively (Friberg 2006). The result sounds often unnatural, mostly because of the quality of the synthesizer. We propose a different approach to automatic music performance in order to obtain a more realistic result: directly modify a recorded human performance. Previous attempts to make automatic expressive modifications of tempo have

002 INTERNATIONAL SYMPOSIUM ON PERFORMANCE SCIENCE been suggested by, for example, Gouyon, Fabig and Bonada 2003 and Janer, Bonada and Jorda 2006. Interactive virtual conducting systems are other examples of expressive tempo and sound level modifications (Borchers, Lee and Samminger 2004; Bruegge et al. 2007). In this case the modifications are not automatic but controlled by the user. In our system, the modifications of the audio signal are done on a note basis, allowing also changes of the length of single tones (articulation). We also take into account timbre variations of acoustic instruments when changing the sound level (Luce 1975). The whole process should avoid noticeable artefacts and work on monophonic and polyphonic recordings. METHOD The system can be divided into three main sections as shown in Figure 1. In section (a), the audio signal is aligned with the score file, transformed into the time-frequency domain and analyzed. In section (b), the modifications on the spectrogram as well as the synthesis of the modified time domain signal are performed. Note lengths, sound level and tempo are computed in section (c) using rules values and inputs from the user. Figure 1. Schematic representation of the system. Score alignment and signal analysis In order to modify the performance on a note basis, each tone needs to be separated from the rest of the signal. In polyphonic recordings, tones can also overlap. A tone produced by an acoustic instrument is usually harmonic, with a large number of partials. To modify the single tone in the time-frequency domain, the partials need to be associated with their corresponding fundamental and note in the score.

INTERNATIONAL SYMPOSIUM ON PERFORMANCE SCIENCE 003 The system uses an enhanced score file containing performance rules values. The score notes are also used in combination with the spectrogram to analyze the audio signal in order to separate the harmonic components of each tone. The score is aligned with the audio signal using tone onset positions, which can be extracted automatically (using a simple algorithm based on an edge detection filter), defined by hand or a combination of the two. The signal is divided into overlapping time windows and transformed into a time-frequency representation using the method proposed by Ferreira and Sinha 2005. This method allows for accurate estimation of the frequency of spectral peaks. For each time window, the expected tone fundamental frequency and its partials are computed according to the notes in the score (the inharmonicity in piano tones is also taken into account using a simple model). The peaks in the spectrogram are detected and associated with the corresponding note in the score. Modifications and synthesis The KTH rule system concentrates on the modification of tempo, sound level and articulation, three acoustic parameters that have been found to be crucial for performance expression (Juslin 2000). In our prototype, the modifications are performed in the frequency domain using an analysissynthesis approach in this order: articulation, sound level and tempo. Articulation is changed by lengthening (staccato to legato) or shortening (legato to staccato) the harmonic tracks corresponding to the tone. Using Ferreira s method (Ferreira 2001) we interpolate the magnitude of the frequency peak and of the two adjacent frequency bins, and we subtract them. In the same way, we can interpolate the magnitude of new peaks and add them to lengthen a tone. Acoustic instruments usually sound brighter (e.g. higher partials are present) when played loud compared to when played soft. Therefore, to obtain a realistic sound level modification, also the timbre needs to be changed. Addition and subtraction of partials can be done using the same method applied for articulation. In addition, knowledge about the original sound level of each tone is needed in order to apply the correct amplitude scaling. Measurement of the single tone level in a polyphonic recording is a complex problem that we have not yet solved. For this reason, in the prototype system, sound level modifications are currently not performed. The modification of tempo is integrated in the synthesis algorithm. As mentioned earlier, the transformation to time-frequency domain is performed by first dividing the audio signal into overlapping time windows,

004 INTERNATIONAL SYMPOSIUM ON PERFORMANCE SCIENCE separated by hop-size Ra. A common way to do time scale modifications (Laroche and Dolson 1999) is to modify the synthesis hop-size, Rs, so that the reconstructed time windows are more or less largely spaced (time scale expansion or compression). When Rs becomes too small or too large, audible artifacts are introduced. To avoid this problem, we either discard some frames or use the same frame twice. This approach has the side effect that it may also smear sharp tone attacks. By using Rs = Ra within tone attacks we avoid this effect. A major drawback of direct modifications of the spectrogram is phase incoherence, which introduces artifacts known as phasiness or loss of presence. The inverse transformation to a time domain signal requires both magnitude (spectrogram) and phase responses. Since only the magnitude is modified, the combination with the original phase response usually does not produce a real signal. Solutions have been proposed that try to correct the phase response to maintain coherence (Laroche and Dolson 1999) for time scale modifications. In our case, the problem is more complicated as we need to keep track of additions and subtractions of frequency peaks. For this reason, we decided to discard the original phase information and reconstruct the time domain signal from the magnitude only using the RTISI (Real Time Iterative Spectrogram Inversion) method (Beauregard, Zhu and Wyse 2005). This algorithm also smears sharp tone onsets. Since we do not modify the magnitude response of the time frames containing onset data, for these frames we use the original phase response to prevent smearing, while for modified frames we use RTISI. Performance values computation The modifications of the performance are based on a new value of sound level and length for each note, as well as a series of tempo values (usually one for each Inter Onset Interval). These values are the sum of the nominal value from the score and a delta value obtained from a weighted sum of the values of the rules. The weights are individually defined by the user, or saved in default sets (e.g. happy, sad, angry, tender performance). There are 19 rules in the system and each rule influences one or more of the acoustic parameters. For a more detailed explanation refer to Friberg 2006. RESULTS The system described above, with some limitations, has been implemented using Matlab. The user is provided with a simple GUI to load audio files and

INTERNATIONAL SYMPOSIUM ON PERFORMANCE SCIENCE 005 score files. The waveform is visualized together with tone onset points. These points can be detected automatically and saved for later use, and also moved to fix eventual errors in the detection. The user can choose some analysis parameters such as window and hop size. After the analysis is performed, the data is stored. The user can choose the overall tempo and performance parameters from the default sets, or by using sliders for each rule. Before performing the synthesis, it is possible to choose whether or not to modify articulation. The sound level modification has not yet been implemented. The interface provides also two audio players to play back the original and the modified performance. A few experiments using monophonic recordings of a theme from Haydn s F Maj. Quartet (Op 74:2), played with piano and guitar, showed very good results for tempo modifications (sharp attacks are preserved). For articulation a sort of reverberation effect is introduced in the silenced parts when the analysis is not able to extract all the frequency peaks. Another experiment has been run using a polyphonic piano recording of Chopin s Etude No. 3, Opus 10. In this case, the tempo modification does not introduce extreme artifacts, but the articulation is rather noisy, as the separation of partials becomes more complex with overlapping tones. DISCUSSION In this paper we presented a system that aims to expressively modify a musical recording (tempo, sound level and articulation of each single tone) in order to obtain an automatic performance comparable to a human performance in terms of expressivity and sound quality. The main problem is the separation of each single tone from the rest of the recording. We use a time-frequency representation and extract harmonic tracks corresponding to each tone. This is yet not reliable enough and we are investigating how to improve the tone separation. The articulation of single notes strongly depends on the quality of the separation. Another open problem is that of measuring the sound level of the single tone in order to consistently modify it. A more reliable onset detection algorithm is also needed. Possible applications for this system are for example in music cognition studies, where stimuli are usually artificial sounding MIDI files. Another example is the implementation of an advance home conducting system that can work with any available recording. It can be also a useful tool for music teachers to show to their pupils different expressive techniques.

006 INTERNATIONAL SYMPOSIUM ON PERFORMANCE SCIENCE Address for correspondence Marco Fabiani, Dept. of Speech, Music & Hearing (TMH), Royal Institute of Technology (KTH), Lindstedtsv. 24, Stockholm, SE-10044, Sweden; Email: himork@kth.se. References Borchers J., Lee E. and Samminger W. (2004). Personal Orchestra: a real-time audio/video system for interactive conducting. Multimedia Systems, vol. 9, pp. 458-465. Bruegge B., Teschner C., Lachenmaier P., Fenzl E., Schmidt D. and Bierbaum S. (2007). Pinocchio: Conducting a virtual symphony orchestra. Proceedings of the International Conference of Advances in Computer Entertainment Technology. Beauregard G. T., Zhu X. and Wyse L. (2005). An efficient algorithm for real-time spectrogram inversion. Proc. of the 8th Int. Conf. on Digital Audio Effects Ferreira A. J. S. (2001). Combined Spectral Envelope Normalization and Subtraction of Sinusoidal Components in the ODFT and MDCT frequency domains. Proc. of the 2001 IEEE Workshop in Applications of Signal Processing in Audio and Acoustics. Ferreira A. J. S. and Sinha E. (2005). Accurate and robust frequency estimation in the ODFT domain. Proc. of the 2005 IEEE Workshop in Applications of Signal Processing in Audio and Acoustics. Friberg A. (2006). pdm: An expressive sequencer with real-time control of the KTH music performance rules. Computer Music Journal, vol. 30, 37-48. Friberg A., Bresin R. and Sundberg J. (2006). Overview of the KTH rule system for music performance. Advances in Cognitive Psychology, Special Issue on Music Performance, vol. 2, pp. 145-161. Gouyon F., Fabig L. and Bonada J. (2003). Rhytmic Expressiveness Transformations of Audio Recordings: Swing Modifications. Proc. of the Int. Conf. on Digital Audio Effects (DAFX03). Janer J., Bonada J. and Jorda S. (2006). Groovator - An implementation of real-time rhythm transformations. Proc. of 121st Convention of the Audio Engineering Society. Juslin P.N. (2000). Cue utilization in communication of emotion in music performance: Relating performance to perception. Journal of Experimental Psychology: Human Perception and Performance, vol. 26, pp. 1797-1813. Laroche J. and Dolson M. (1999). Improved Phase Vocoder time-scale modification of audio. IEEE Trans. on Speech and Audio signal processing, vol. 7, pp. 323-332. Luce D. A. (1975). Dynamic spectrum changes of orchestral instruments. Journal of the Audio Engineering Society, vol. 23, pp. 565-568.