DEVELOPMENT OF MIDI ENCODER "Auto-F" FOR CREATING MIDI CONTROLLABLE GENERAL AUDIO CONTENTS

Similar documents
Robert Alexandru Dobre, Cristian Negrescu

Measurement of overtone frequencies of a toy piano and perception of its pitch

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T )

Melody Retrieval On The Web

QUALITY OF COMPUTER MUSIC USING MIDI LANGUAGE FOR DIGITAL MUSIC ARRANGEMENT

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Music Representations

Music Radar: A Web-based Query by Humming System

DICOM medical image watermarking of ECG signals using EZW algorithm. A. Kannammal* and S. Subha Rani


Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Digital Representation

Music Representations

Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals. By: Ed Doering

Lab P-6: Synthesis of Sinusoidal Signals A Music Illusion. A k cos.! k t C k / (1)

Tempo and Beat Analysis

Pitch correction on the human voice

Advanced Signal Processing 2

A repetition-based framework for lyric alignment in popular songs

Speech and Speaker Recognition for the Command of an Industrial Robot

Automatic Construction of Synthetic Musical Instruments and Performers

Audio-Based Video Editing with Two-Channel Microphone

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

International Journal of Computer Architecture and Mobility (ISSN ) Volume 1-Issue 7, May 2013

Experiments on musical instrument separation using multiplecause

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

CSC475 Music Information Retrieval

Extracting Significant Patterns from Musical Strings: Some Interesting Problems.

Query By Humming: Finding Songs in a Polyphonic Database

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

HUMMING METHOD FOR CONTENT-BASED MUSIC INFORMATION RETRIEVAL

THE importance of music content analysis for musical

2. AN INTROSPECTION OF THE MORPHING PROCESS

Subjective Similarity of Music: Data Collection for Individuality Analysis

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

RECOMMENDATION ITU-R BT (Questions ITU-R 25/11, ITU-R 60/11 and ITU-R 61/11)

Music Segmentation Using Markov Chain Methods

Normalized Cumulative Spectral Distribution in Music

Wipe Scene Change Detection in Video Sequences

Automatic music transcription

UNIVERSITY OF DUBLIN TRINITY COLLEGE

REAL-TIME PITCH TRAINING SYSTEM FOR VIOLIN LEARNERS

Voice & Music Pattern Extraction: A Review

Digital music synthesis using DSP

A Matlab toolbox for. Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE

Comparison of Dictionary-Based Approaches to Automatic Repeating Melody Extraction

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Data Representation. signals can vary continuously across an infinite range of values e.g., frequencies on an old-fashioned radio with a dial

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Music Information Retrieval Using Audio Input

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

A prototype system for rule-based expressive modifications of audio recordings

Audio Compression Technology for Voice Transmission

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

Introduction to image compression

Music Alignment and Applications. Introduction

MULTIMEDIA TECHNOLOGIES

Appeal decision. Appeal No France. Tokyo, Japan. Tokyo, Japan. Tokyo, Japan. Tokyo, Japan. Tokyo, Japan

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing

Adaptive Key Frame Selection for Efficient Video Coding

Hidden Markov Model based dance recognition

A Case Based Approach to the Generation of Musical Expression

Tool-based Identification of Melodic Patterns in MusicXML Documents

CONTENT-BASED MELODIC TRANSFORMATIONS OF AUDIO MATERIAL FOR A MUSIC PROCESSING APPLICATION

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

INTRA-FRAME WAVELET VIDEO CODING

Pitch Analysis of Ukulele

Enhancing Music Maps

Topics in Computer Music Instrument Identification. Ioanna Karydi

FPGA IMPLEMENTATION AN ALGORITHM TO ESTIMATE THE PROXIMITY OF A MOVING TARGET

Predicting Performance of PESQ in Case of Single Frame Losses

Guidance For Scrambling Data Signals For EMC Compliance

PRODUCTION OF TV PROGRAMS ON A SINGLE DESKTOP PC -SPECIAL SCRIPTING LANGUAGE TVML GENERATES LOW-COST TV PROGRAMS-

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

Beethoven s Fifth Sine -phony: the science of harmony and discord

Keywords: Edible fungus, music, production encouragement, synchronization

Semi-supervised Musical Instrument Recognition

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals

Available online at ScienceDirect. Procedia Computer Science 46 (2015 )

Proposed Standard Revision of ATSC Digital Television Standard Part 5 AC-3 Audio System Characteristics (A/53, Part 5:2007)

Singer Traits Identification using Deep Neural Network

Introduction to Data Conversion and Processing

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Creating Data Resources for Designing User-centric Frontends for Query by Humming Systems

Digital Video Telemetry System

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS

Spectral Sounds Summary

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

A Basic Study on the Conversion of Sound into Color Image using both Pitch and Energy

Introduction! User Interface! Bitspeek Versus Vocoders! Using Bitspeek in your Host! Change History! Requirements!...

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A Need for Universal Audio Terminologies and Improved Knowledge Transfer to the Consumer

8/30/2010. Chapter 1: Data Storage. Bits and Bit Patterns. Boolean Operations. Gates. The Boolean operations AND, OR, and XOR (exclusive or)

Transcription:

DEVELOPMENT OF MIDI ENCODER "Auto-F" FOR CREATING MIDI CONTROLLABLE GENERAL AUDIO CONTENTS Toshio Modegi Research & Development Center, Dai Nippon Printing Co., Ltd. 250-1, Wakashiba, Kashiwa-shi, Chiba, 277-0871 Japan e-mail: Modegi-T@mail.dnp.cojp Abstract: Key words: The MIDI interface is originally designed for electronic musical instruments but we consider this music-note based coding concept can be extended for general acoustic signal description. At first we proposed applying the MIDI technology to coding ofbio-medical auscultation sound signals such as heart sounds. Then we have tried to extend our encoding target and improve the coding precision based on Generalized Harmonic Analysis in order to apply it to vocal sounds. Currently, we are trying to separate each tone included in popular songs and encode both vocal sounds and several background instrumental sounds into separate MIDI channels. Using a GM-standard MIDI tone generator, we can playback this multi-channel MIDI encoded data including vocal sounds. In this paper, we present an abstract algorithm of our developed MIDI software encoder tool, being used for producing interactive general audio contents controlled by MIDI. MIDI coding, audio contents, automatic notation, acoustic signal processing 1. Introduction MIDI (Musical Instrument Digital Interface) is originally designed for musical instrument, and we are considering MIDI as an ideal coding method because of its coding efficiency and high-quality sound reproduction capability. The first application of MIDI technology was synthesising The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-0-387-35660-0_65 R. Nakatsu et al. (eds.), Entertainment Computing IFIP International Federation for Information Processing 2003

234 Toshio Modegi the MIDI coding are similar to those of text formats, if it is applied to audio databases, we can retrieve audio contents by audio keywords or music-note strings [2]. We have been interested in multimedia medical databases, especially audio databases for heart sounds and lung sounds, and we have proposed the MIDI encoding method especially for heart sounds, and this algorithm features its real-time processing capability [3]. Besides our implementation of this proposed method for a heart-sound coding, we tried applying a MIDI coding to other types of sound materials including bird singing voices [4]. And we found out this converted MIDI data could be used for new types of interactive audio contents producing non-existing natural sounds. In order to process various types of acoustic signals, we categorise general acoustic signals as two groups whether the spectrum components are distributed intermittently or continuously. Most natural acoustic signals including human voices and biological signals belong to the latter continuous group, whereas musical acoustic sounds except percussion sounds belong to the former intermittent group. Then we defined two kinds of MIDI coding approaches depending on processing complexity: a real-time coding method and a high-precision coding method [4]. As a result of our implementation of both types of coding, using the highprecision coding method, we found out it was possible to playback speeches and singings by MIDI tone generators, using our proposed non-linear extended GHA (Generalized Harmonic Analysis) frequency analysis method [5]. Then we have focused on the decoder sound module, and tried to produce more natural sounds as the original PCM sounds [6]. We also improved the frequency analysis precision by variable frame-length analysis, evaluated coding precision [7]. These days we have been developing sound source separation techniques, especially separation of vocal parts from mixed down songs. Considering our proposed method as a very low bit-rate audio codec, we have evaluated and compared it with the other encoding methods. We have reported the encoding quality of 8-kbps data by our MIDI encoding method was superior to that of 16-kbps MPEG-l layer 3 encoded data [8]. Furthermore, using these improved coding techniques we are trying to apply this MIDI coding to symbolic expression of acoustic signals for retrieving music archives by note-based keywords. As a structured symbolic description format, we choose XML (extensible Mark-up Language)[9] because this format is widely used for medical application [10]. In this paper, we are going to overview our improved MIDI encoding algorithm, which has been implemented into Windows software and is now distributed for free. We expect this software tool to be used for producing interactive general audio contents controlled by MIDI.

Development of MIDI Encoder "Auto-F"for Creating MIDI 235 2. Encoding Method 2.1 Background and Concept of MIDI Coding These days more and more MIDI applications are created such as karaoke, cell phone pager, music game contents, pet robots, automatic piano and player's guide data for music keyboards. In order to provide existing or newly released music record contents to these applications, sound format conversion processes are necessary. Today these conversion business needs are expanding especially in Japan. However these processes are dependent totally on human ears called "ear-copy" and skilled manual operations, and they need trained musical talent. As a solution several MIDI direct input devices imitating musical instruments are being devised by several musical instrument makers, but these devices restrict musical performances. Therefore an automatic conversion tool including automatic notation processing is being expected. The general application of MIDI technique called DTM (Desk Top Music) is that digitising music score written by a composer into the MIDI format and producing music instrument parts on a desk top without musician, musical instrument nor recording studio. And this technique is widely applied to today's commercial music production. However, for non-musical acoustic materials such as vocal parts, which are difficult to be expressed with MIDI music notes, you need singers and a recording studio facility. In this sense, we proposed the other way, which converts the existing audio waveform materials to MIDI music notes. In this method you can control any kind of audio materials by MIDI functions interactively and reproduce even vocal sounds by MIDI tone generators or electronic musical instruments. And editing converted MIDI codes, we can also reproduce the similar music scores written by composers. Our proposing MIDI coding is a kind of analytic-synthetic coding approaches which separate a given audio signal to amounts of sinusoidal waveforms and describe them with the frequency and intensity parameters of their separated harmonic sine waveforms. Whereas our method separates them to several predefined harmonic complex waveforms, which available MIDI tone generators can generate, and describes them with the frequency (namely MIDI defined note-number) and the amplitude (namely MIDI defined velocity) parameters of their separated predefined harmonic complex waveforms. In general the number of required harmonic complex waveforms for describing will be not so many as those of the analysed sinusoidal waveforms because each harmonic complex waveform is made of several sinusoidal waveforms. Therefore, the coded bit-rate of this MIDI method

236 Toshio Modegi will become smaller about 1110 than that of the previous general analyticsynthetic coding approaches. 2.2 Two Types of Approaches for MIDI Coding MIDI data are a collection of pairs of Note-On and Note-Off command strings called events where each pair denotes a piece of music note, and each event is composed of a relative time-stamp (delta-time in MIDI standard terms), a frequency (note-number) and an amplitude (velocity) parameter [3]. In this section we describe how these MIDI parameters can be numerically obtained. As we described, we proposed two approaches depending on the types of source acoustic signals whether musical acoustic or the other signals. Using a frequency analysis technique such as GHA (Generalized Harmonic Analysis) method [5], we can separate some frame get) (framelength=1) extracted from the given acoustic signal. By the variable framelength analysis technique [7], we can obtain a set of N separated sinusoidal functions as follows: g(t) L { Ansin (2nfnt) + Bncos (2nfnt) }. n=l,n (1) Where the both coefficients of An and Bn are defined by the following equations. An = L 21Tn{ get) sin (2nfnt) }. (2) t=0, T,,-I Bn = L 21Tn{ g(t) cos (2nfnt) }. (3) t=0, 7;,-1 In these equations Tn is the maximum value of Tn=k!.fn<T (k: appropriate positive integer value), and fn is given by the equation fn=440 2(n-69)112 which generates frequency values on the MIDI note-number logarithm scale. Defining harmonic complex functions as u/t), we can rewrite the equation (1) with smaller summation elements P«N as follows: get) i=i,p L u;u/t). (4) Then we define p(i) as a representative frequency identification number of u;(t). In the first case like shown in Figure 4-(A), u;(t) can be expressed with a summation of a fundamental frequency hu) and its harmonic components ij;,u), as follows (j: integer value 1,2,3... ):

Development of MIDI Encoder "Auto-F" for Creating MIDI Ui(t) = L{Ai(j)sin(2njJ;,(i)t)+Bi(j)cos(2njJ;,(i)t)}. j=l,.1 237 (5) In the other case like shown in Figure 4-(B), Ui(t) can be expressed with a summation of a formant local peak frequency J;,(i) and continuous frequency components distributed in its neighbour giving the integer value of 0 around 1, as follows J;,(i)+3) : Ui(t) = j=l,.1 (6) If we choose the harmonic complex function Ui(t) from the wave tables defined in our using MIDI tone generator, we can reproduce get) with P number of notes giving note-number Ni and velocity value Vi. These values are generated fromj;,(i) and ai parameters respectively as follows: M=40 loglo (h,(i(440) + 69. (7) V;=128Ca/ 12 (C: constant, (8) The time of Note-On of this MIDI note event is the start position of extracted frame on the source acoustic signal, and the duration time (or Note Off delta-time) is given by the analysed frame-shifted interval 't. I r l " g(t) extracted time frame Separate to Sinusoidal Functions. I J. AAA AAA AAA LJ_L:r 1 '" frequency [i frequency (A) Musical Acoustic Signals (musical instrument sounds) (B) General Acoustic Signals (Vocal, Biomedical, Natural Sounds, Noises) Figure 4. Two types of approaches for MIDI coding

238 Toshio Modegi 2.3 Algorithm Design of MIDI Coding Figure 5 shows the whole process of our designed MIDI encoding process. The first part is the Frequency Analysis, which separates the part of a source signal to N number of spectrum components and needs the most calculation load. The next Harmonic Grouping of Notes is integrating selected multiple spectrum components or notes, but in our implementation this process is just selecting out representative notes by their volume value. PCM Sound File (MicrosoftW A V format) Until the end of Five-lined Staff (document) Sound Signals (audio) XMLformat (dorent) MIDI Note Event data (Standard MIDI File format 0) Network Distribution Figure 5. Abstract flowchart of MIDI encoding process The first two processes will be repeated by shifting the extracted position until the end of the sound file. In case the source signal is stereo, two sets of notes are analysed for each extracted position, and integrated to a set of notes with pan-pot parameters added. The fourth Temporal Grouping of Notes is connecting the temporally adjacent notes, which have similar frequency and volume parameters, and producing a longer duration note. The last two processes are converting each integrated note to the MIDI event data format. Before that we should regulate the number of output notes or output bit-rate in order that the standard GM or other types of MIDI tone generator can play-back encoded data. The right side flowcharts in Figure 5 show the several utilisation processes after MIDI data are created. The top three functions of MIDI Data Editor, Common Music Notation Tool, and MIDI Sequencer can be provided

Development of MIDI Encoder "Auto-F" for Creating MIDI 239 in commercially available off-the-shelfdtm (Desk Top Music) composition tools such as "Yamaha XG-Works" or "Steinberg Cubasis" what we use. Not specifically stated in this paper, we are also considering structuring and symbolising the encoded MIDI data into XML (extensible Markup Language) document format for network audio content distribution. Figure 6. A snapshot of MIDI encoder software tool 3. Conclusions In this paper, we have described an abstract MIDI encoding algorithm based on constructing harmonic complex functions from the sinusoidal waveforms analysed by GHA. Extending this algorithm, not specifically described in this paper, we could also separate vocal parts from singing song materials and encode both vocal and instrumental parts into multiple channel MIDI data streams. And we could generate complete musical sounds including vocal sounds with a single GM-standard MIDI tone generator. Figure 6 shows an encoding example of piano and vocal separation; the source audio material was "Irish folksong: Danny Boy" and its length was

240 Toshio Modegi 20 seconds. The output bit-rate was 10 kbps and its calculation time was about a minute using a Pentium III 600 MHz Windows98 PC. As for future works, we are considering higher accurate sound source separation techniques, removal techniques of harmonic overtone components for automatic music notation, support techniques of pitch-bend functions for improving decoded sounds, higher performance of structuring and symbolising techniques for generating XML data, and an algorithm redesign for real-time processing. This research has been promoted by the Digital Content Association of Japan as a 2000-year government project: "Development of Multimedia Content Creating Tools," being also financially supported by the Information-technology Promotion Agency Japan and the Ministry of Economy, Trade and Industry Japan. The developed software MIDI encoder tool (currently Japanese MS-Windows edition only!) is distributed for free at the following Web site. (URL: http://www.dcaj.or.jp) References [1] M. Goto and Y. Muraoka,"A Beat Tracking System for Acoustic Signals of Music," Proceedings of ACM international conference on Multimedia, pp.365-372, 1994. [2] R. J. McNab, L. A. Smith, 1. H. Witten, C. L. Henderson and S. J. Cunningham,"Towards the Digital Music Library: Tune Retrieval from Acoustic input," Proceedings of the 1st ACM International Conference on Digital libraries, pp.ii-18, 1996. [3] T. Modegi and S. Iisaku, "Application of MIDI Technique for Medical Audio Signal Coding," Proceedings of IEEE 19-th International Conference of the Engineering in Medicine & Biology Society, Chicago, pp. 1417-1420, Oct. 1997. [4] T. Modegi and S. Iisaku, "Proposals of MIDI Coding and its Application for Audio Authoring," Proceedings of International Conference on IEEE Multimedia Computing and Systems, Austin, USA, pp.305-314, Jun. 1998. [5] T. Modegi, "Multi-track MIDI Encoding Algorithm Based on GHA for Synthesizing Vocal Sounds," Journal of Acoustic Society of Japan (E), Vol.20, No.4, pp.319-324, 1999. [6] T. Modegi, "High-precision MIDI Encoding Method Including Decoder Control for Synthesizing Vocal Sounds," Proceedings of the seventh A CM international conference on Multimedia, Part 2, Orlando, USA, pp.45-48, Nov. 1999. [7] T. Modegi, "MIDI Encoding Method Based on Variable Frame-length Analysis and its Evaluation of Coding Precision," Proceedings of IEEE International Conference on Multimedia & Expo, New York, pp.1043-1046, Aug. 2000. [8] T. Modegi, "Very Low Bit-rate Audio Coding Technique Using MIDI Representation," Proceedings of ACM ll-th NOSSDAV Workshop, New York, pp.167-176, Jun. 2001. [9] T. Modegi, "Structured Description Method for General Acoustic Signals Using XML Format," Proceedings of IEEE International Conference on Multimedia & Expo, Tokyo, Japan, pp.932-935, Aug.200!. [10] T. Modegi, "XML Transcription Method for Biomedical Acoustic Signals," Proceedings of 10th World Congress on Health and Medical Informatics Medinf0200I, London, UK, pp.366-370, Sep.2001.