Development of an Optical Music Recognizer (O.M.R.).

Similar documents
Primitive segmentation in old handwritten music scores

Optical Music Recognition System Capable of Interpreting Brass Symbols Lisa Neale BSc Computer Science Major with Music Minor 2005/2006

USING A GRAMMAR FOR A RELIABLE FULL SCORE RECOGNITION SYSTEM 1. Bertrand COUASNON Bernard RETIF 2. Irisa / Insa-Departement Informatique

Department of Computer Science. Final Year Project Report

Music Representations

BUILDING A SYSTEM FOR WRITER IDENTIFICATION ON HANDWRITTEN MUSIC SCORES

BREAKING ACCESSIBILITY BARRIERS Computational Intelligence in Music Processing for Blind People

Developing Your Musicianship Lesson 1 Study Guide

ENGIN 100: Music Signal Processing. PROJECT #1: Tone Synthesizer/Transcriber

6.111 Final Project: Digital Debussy- A Hardware Music Composition Tool. Jordan Addison and Erin Ibarra November 6, 2014

Music Representations

Basics of Music Notation

Representing, comparing and evaluating of music files

AP Music Theory Westhampton Beach High School Summer 2017 Review Sheet and Exercises

Preface. Ken Davies March 20, 2002 Gautier, Mississippi iii

Written Piano Music and Rhythm

Part 1: Introduction to Computer Graphics

Northeast High School AP Music Theory Summer Work Answer Sheet

Keys: identifying 'DO' Letter names can be determined using "Face" or "AceG"

xlsx AKM-16 - How to Read Key Maps - Advanced 1 For Music Educators and Others Who are Able to Read Traditional Notation

Study Guide. Solutions to Selected Exercises. Foundations of Music and Musicianship with CD-ROM. 2nd Edition. David Damschroder

Keyboard Version. Instruction Manual

The Practice Room. Learn to Sight Sing. Level 3. Rhythmic Reading Sight Singing Two Part Reading. 60 Examples

The Practice Room. Learn to Sight Sing. Level 2. Rhythmic Reading Sight Singing Two Part Reading. 60 Examples

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

2. AN INTROSPECTION OF THE MORPHING PROCESS

Figure 1: Feature Vector Sequence Generator block diagram.

Module 3: Video Sampling Lecture 17: Sampling of raster scan pattern: BT.601 format, Color video signal sampling formats

Optical Music Recognition: Staffline Detectionand Removal

Towards the recognition of compound music notes in handwritten music scores

APPENDIX A: ERRATA TO SCORES OF THE PLAYER PIANO STUDIES

Module 1: Digital Video Signal Processing Lecture 3: Characterisation of Video raster, Parameters of Analog TV systems, Signal bandwidth

Introduction to GRIP. The GRIP user interface consists of 4 parts:

MUSIC THEORY. The notes are represented by graphical symbols, also called notes or note signs.

Virtual Piano. Proposal By: Lisa Liu Sheldon Trotman. November 5, ~ 1 ~ Project Proposal

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

APPLICATIONS OF DIGITAL IMAGE ENHANCEMENT TECHNIQUES FOR IMPROVED

For the SIA. Applications of Propagation Delay & Skew tool. Introduction. Theory of Operation. Propagation Delay & Skew Tool

Student Performance Q&A: 2001 AP Music Theory Free-Response Questions

CS2401-COMPUTER GRAPHICS QUESTION BANK

AP Music Theory. Sample Student Responses and Scoring Commentary. Inside: Free Response Question 1. Scoring Guideline.

Lab P-6: Synthesis of Sinusoidal Signals A Music Illusion. A k cos.! k t C k / (1)

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

Music Radar: A Web-based Query by Humming System

GRAPH-BASED RHYTHM INTERPRETATION

Robert Alexandru Dobre, Cristian Negrescu

What is Statistics? 13.1 What is Statistics? Statistics

A Matlab toolbox for. Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE

SmartScore Quick Tour

I) Documenting Rhythm The Time Signature

A COMPUTER VISION SYSTEM TO READ METER DISPLAYS

Score Printing and Layout

High Value-Added IT Display - Technical Development and Actual Products

Orchestration notes on Assignment 2 (woodwinds)

Optical music recognition: state-of-the-art and open issues

Figure 2: Original and PAM modulated image. Figure 4: Original image.

Homework Booklet. Name: Date:

Credo Theory of Music training programme GRADE 4 By S. J. Cloete

Off-line Handwriting Recognition by Recurrent Error Propagation Networks

Getting started with music theory

Lecture 2 Video Formation and Representation

PYROPTIX TM IMAGE PROCESSING SOFTWARE

Types of CRT Display Devices. DVST-Direct View Storage Tube

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

CSC475 Music Information Retrieval

Middle East Technical University

AP MUSIC THEORY SUMMER ASSIGNMENT AP Music Theory Students and Parents,

Film-Tech. The information contained in this Adobe Acrobat pdf file is provided at your own risk and good judgment.

Hal Leonard Student Piano Library Correlation to Music Ace Maestro

Grade One. MyMusicTheory.com

Grade One. MyMusicTheory.com. Music Theory PREVIEW 1. Complete Course, Exercises & Answers 2. Thirty Grade One Tests.

Melody transcription for interactive applications

Doubletalk Detection

Analysis, Synthesis, and Perception of Musical Sounds

Building a Better Bach with Markov Chains

AP MUSIC THEORY 2016 SCORING GUIDELINES

CARLISLE AREA SCHOOL DISTRICT Carlisle, PA Elementary Classroom Music K-5

Automatically Creating Biomedical Bibliographic Records from Printed Volumes of Old Indexes

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

Ph.D Research Proposal: Coordinating Knowledge Within an Optical Music Recognition System

Algebra I Module 2 Lessons 1 19

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

Objectives: Topics covered: Basic terminology Important Definitions Display Processor Raster and Vector Graphics Coordinate Systems Graphics Standards

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

QUALITY OF COMPUTER MUSIC USING MIDI LANGUAGE FOR DIGITAL MUSIC ARRANGEMENT

InfoVue OLED Display

about Notation Basics Linus Metzler L i m e n e t L i n u s M e t z l e r W a t t s t r a s s e F r e i d o r f

8/30/2010. Chapter 1: Data Storage. Bits and Bit Patterns. Boolean Operations. Gates. The Boolean operations AND, OR, and XOR (exclusive or)

2013 Assessment Report. Music Level 1

Digital Television Fundamentals

Music Source Separation

How to Read Just Enough Music Notation. to Get by in Pop Music

Video Graphics Array (VGA)

THE BERGEN EEG-fMRI TOOLBOX. Gradient fmri Artifatcs Remover Plugin for EEGLAB 1- INTRODUCTION

Student Performance Q&A:

OpenStax-CNX module: m Time Signature * Catherine Schmidt-Jones

RECOMMENDATION ITU-R BT.1201 * Extremely high resolution imagery

(12) United States Patent

Video coding standards

Transcription:

Development of an Optical Music Recognizer (O.M.R.). Xulio Fernández Hermida, Carlos Sánchez-Barbudo y Vargas. Departamento de Tecnologías de las Comunicaciones. E.T.S.I.T. de Vigo. Universidad de Vigo. E.T.S.I.T. Ciudad Universitaria S/N. 36200 Vigo. Phone: (986) 812131. Fax: (986) 812121. e-mail: xfernand@tsc.uvigo.es. Abstract: This communication describes a system able to recognize printed music and to convert it into a standard MIDI file [1] that we could hear using any common sound card. The recognizer was developed to work with machine-printed music notation because it relies on some rules of music writing. The final application has an user friendly user interface working in the Windows environment. 1. Introduction: Our recognizer is developed to work with machine-printed music notation (not with handwriten music), because there are some rules [2], [3] in music writing that are followed by music typewriters but not always applied in handwriten music. It s important to see that music symbols vary in orientation, positioning and appearance. Typical music symbols are much less regular in appearance and positioning than the characters of printed text. Adjacent and overlapping symbol placements are used, making harder the recognition process. Our system works with bilevel (black and white), and it s completely resolution independent. Minimum resolution is 200 dpi. Higher resolutions involve greater computational costs but don t mean better performance. The aplication runs on Windows 3.x or Windows 95, and it s very easy to use. In our first version the images are obtained from graphic files, but soon it will be posible to read images directly from scanners using the TWAIN protocol. Our application can locate and recognize the following symbols: all notes and rests (whole, half, quarter, eight, sixteenth...) and what they involve (flags or hooks, noteheads...), accidentals (flat, natural and sharp), clefs (trebble, alto and bass), key signature, and some more. Also locates staff lines and posible systems. This printed music recognizer uses a wide variety of image processing methods, such as thinning, erosion, segmentation, matching with masks, thresholding and projections over the X and Y axis. The applications of this system vary from music teaching to the masive storage of printed music. 2. General Overview: All processing is structured in a layerbased architecture. This means that we work with different processed images that we can consult at any moment. In that way, we can try to find out where there are some symbols in one layer, and check them in another one. We always keep the original image as a layer. We sometimes have to copy some windows between different layers. Our recognizer doesn t find the symbols in the same order in which a musician would read them, this is: left to right. Otherwise, we work with each bar and try to extract a particular symbol (i.e. black-headed notes, rests...). 3. First Processing Stage: There are three important tasks that our recognizer performs in the first processing stage: page geometry computation, staff lines removing, and segmentation of image zones (bars) where we wil try to locate musical symbols.

When we say geometry we mean how staffs are grouped forming systems (or if there is no system at all). In a system some of the voices involved are played simultaneously. To find out if there are systems we do as a musician would, id est: we find out if there are long bar lines joining different staffs. To implement this we make first vertical erosions in a certain region of our image and then we follow vertical lines. We include in this step an estimation of the staff lines spacing and staff lines thickness. Sizes and distances obtained along the recognizing process will be measured in units relative to the staff line spacing and thickness that we estimate at this point. Estimation of staff line spacing is done by scanning nine evenly spaced columns in the image and making an histogram [6] of the distances between opposite transitions. We choose a balanced media between the most repeated value (D), (D+1) and (D-1). Staff lines define the vertical coordinate system for pitches and provide a horizontal direction for the temporal coordinate system. The five staff lines that we find on a piece of printed music are not exactly paralell, not exactly horizontal, not exactly equidistant, their thickness is not exactly constant, and they are even not exactly straight. Scanning and quantization noise are the reason for those problems. Using our estimation of staff lines spacing and a non-fixed staff template we can find out the positions of the staffs in the image. This task is very important because we ll try to search symbols later over these located staffs or near them, not all over the image. Later, we find the first staffs in both the left and right regions of the image and compute the image skew. When we know where the staffs are, we can estimate the staff lines thickness more precisely. With this purpose, we make a new histogram of black lines thickness in some columns of the image. Before finding staffs we didn t know wich of those black lines were true staff lines, but at this moment we can remove the false ones getting a better histogram. At this point, we remove the staff lines. As we know some points belonging to staffs and the slope or skew of staff lines, we can remove them easily following those lines and removing all vertical transitions with a thickness lower than 1.5 times the computed thickness. Staff lines have been removed so we have in a layer (image in memory) the same original printed music without horizontal staff lines. The image now is much more clean, with a lot of isolated symbols, and it s easier to locate them. The next step, as we said before, it s to find out in which regions of the image there are music symbols. This involves finding out where there are bars: we ll need the positions of that vertical bar lines. Those vertical lines divide the sheet music into intervals of the same temporal duration. That is very important for our purposes because this will help us to correct posible errors. Our application can locate simple bars, double ones and repetition bars. The way we do it is making X proyections of different parts of the located staffs. Once we have removed staff lines, we get high peaks in the X proyection that we study to know if they were really caused by bar lines. Then we remove those vertical lines too. All our proceses will be done now in a bar-level way. We will take every bar and study it deeply. The place the bar is, the way we process it, id est: if it s the first bar of a staff, we will look for the clef and key signature. Figure 1: original image.

Figure 2: original image with the staff and vertical lines removed. Figure 3: division into bars. 4. Locating symbols: Our system seldom uses templatematching methods. We locate different parts of symbols in different ways, depending on the symbol. As there are usually more black headed notes are than white headed ones, then we locate them using erosion methods (erosion depth always depends on our previously computed thickness). Here, we are not really searching the notes but the black heads. We take a bar and erode it twice the computed thickness. Figure 4: heads of black headed notes. Then, we find out the bounding boxes of each group of black pixels and filter them using their horizontal and vertical dimensions. We have now some bounding boxes that could be real black heads. Later, we study the correct layer (image without staff and vertical lines) to locate the vertical line of each posible black head (note) and (following the line) we count the hooks. So we know the duration of these notes. That process is made processing, for each note, a window containing both vertical line and hooks (if any). We make a vertical erosion, depending again on thickness. In this manner, we obtain a second window with just horizontal white/black/white transitions. Those transitions are the hooks, we now only have to count them. We could now continue searching other kind of symbols all around the image, but we use the information that we know at this point to decide where to search. We know time signature (i.e. ¾), so we have a measure of total time that must be in a bar (between two vertical lines). We have found at this point all the black headed notes, so that we can add their durations. So we can decide in which bars is worth continuing with next recognizing steps. Obviously, those bars will have an accumulated time (computed adding note durations) less than time fixed by time signature. In figure 5, we have a ¾ time signature, we have shaded bars where we will continue searching symbols. See that when we start next recognizing steps in those bars, we work in a image layer very different to the original one. Our layer now, is such as figure 6. We have deleted from this layer almost every symbol that we have found: staff lines, bar lines, black-headed notes (with their vertical lines and hooks), accidentals associated to those notes, and some more. We have found clef and key signature in the first steps of our proccess, but we have not deleted them because it is not necessary. We use mask matching to decide the clef.

Figure 5: original image, we have marked those bars where we have to continue searching after finding black headed notes. Figure 6: image layer where we will search for white headed notes. The key signature is obtained looking accidentals in the sequence of positions where we know that should be. We know that there could be from one to seven accidentals, or none of them. But if there are, the sequence of positions is fixed. To recognize those accidentals and everything else in the image, we use knowledge about their position, size (X and Y axis), and projections over X axis. So we decide between flat, natural and sharps. To locate whole and half noteheads, we search first the vertical line (if it exists), and then we follow the contour of notehead to find its dimensions. The application proceeds extracting or deleting the symbols that recognizes from the working layer, so this layer becomes cleaner for later searchs. We use time signature to know where there may be more unrecognized symbols, comparing the fixed total time with the accumulated time of localized symbols. In this way, we get a lower proccessing time because we only look for symbols where they may be, and we almost never perform searchs in all image space. When we have all the results, we generate an ASCII file, in MEL format (Musical Events Listing), that contains all information about the recognized music symbols. This file doesn t contain position information: it s just an easy-to-read format containing the recognized music. We could define a Extended MEL format with position information, but this was not our purpose. 5. MEL to MIDI Conversion: We have developed a MEL to MIDI file converter. This application takes our MEL (ASCII) file resulting from our recognizer, and converts it into a Standard MIDI File. In this way, we can hear the music that we ve recognized in any PC with a common sound card. This system is not a text to MIDI converter: it only converts MEL 1.0 files into MIDI, so we are working in better specifications for MEL files, that involve new recognized symbols. 6. Future Lines and Conclusions: We can comment the following future developments (we are already working in some of them): Scanner control using the TWAIN standard. Extension of the MEL format to include new recognized symbols.

Study of the much more difficult problem of recognizing handwritten music. As main conclusion we have designed a system able to read printed music. The algorithms we use are fast and easy. Our preferred methods are the morphological ones. 7. References: [1] The International MIDI Association. Standard MIDI-File Format Spec.1.1. [2] J. Chailley, H. Challan H. Teoría completa de la Música. Vol. I. De.Alphonse Leduc. [3] J. Zamacois. Teoría de la Música (Libro II). Labor. [4] H. S. Baird, D. Blostein. A Critical Survey of Music Image Analysis. Structure Document Image Analysis, pp 405-434. Springer-Verlag. [5] T. Kientzle. Scaling Bitmaps with Bresenham. C/C++ Users Journal. Pp 51-53. October, 1995. [6] H. Kato, S., Inokuchi. A Recognition System for Printed Piano Music Using Musical Knowledge and Constraints. Structure Document Image Analysis, pp 435-457. Springer-Verlag. [7] N. P. Carter, R. A., Bacon. Automatic Recognition of Printed Music. Structure Document Image Analysis, pp 458-465. Springer- Verlag. [8] D. Phillips. Image Proccessing in C. Chapter: Analyzing and Enhancing Digital Images. R&D Technical Books.