Ph.D Research Proposal: Coordinating Knowledge Within an Optical Music Recognition System

Similar documents
Optical Music Recognition System Capable of Interpreting Brass Symbols Lisa Neale BSc Computer Science Major with Music Minor 2005/2006

2013 Assessment Report. Music Level 1

Coimisiún na Scrúduithe Stáit State Examinations Commission LEAVING CERTIFICATE EXAMINATION 2003 MUSIC

SAMPLE ASSESSMENT TASKS MUSIC CONTEMPORARY ATAR YEAR 11

EXTENSIBLE OPTICAL MUSIC RECOGNITION

The KING S Medium Term Plan - MUSIC. Y7 Module 2. Notation and Keyboard. Module. Building on prior learning

Hidden Markov Model based dance recognition

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University

USING A GRAMMAR FOR A RELIABLE FULL SCORE RECOGNITION SYSTEM 1. Bertrand COUASNON Bernard RETIF 2. Irisa / Insa-Departement Informatique

Page Turning Score Automation for Musicians

Introductions to Music Information Retrieval

Characterization and improvement of unpatterned wafer defect review on SEMs

Music Representations

Automatic Construction of Synthetic Musical Instruments and Performers

Sample assessment task. Task details. Content description. Task preparation. Year level 9

Outline. Why do we classify? Audio Classification

Development of an Optical Music Recognizer (O.M.R.).

Comparison of Dictionary-Based Approaches to Automatic Repeating Melody Extraction

Assessment Schedule 2012 Music: Demonstrate knowledge of conventions in a range of music scores (91276)

IMIDTM. In Motion Identification. White Paper

Orchestration notes on Assignment 2 (woodwinds)

Representing, comparing and evaluating of music files

Towards the recognition of compound music notes in handwritten music scores

Developing Your Musicianship Lesson 1 Study Guide

The Kikuchi Music Institute Library. Creating Music LEVEL ONE. A comprehensive course in music composition. By Lee W. Kikuchi

2014 YEAR 10/11 MUSIC SEMESTER 2

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

Assessment Schedule 2017 Music: Demonstrate knowledge of conventions used in music scores (91094)

Student Performance Q&A:

GRAPH-BASED RHYTHM INTERPRETATION

Melody Retrieval On The Web

Automatic Music Clustering using Audio Attributes

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

GPS. (Grade Performance Steps) The Road to Musical Success! Band Performance Tasks YEAR 1. Tenor Saxophone

BLUE VALLEY DISTRICT CURRICULUM & INSTRUCTION Music 9-12/Honors Music Theory

Sample assessment task. Task details. Content description. Year level 9. Class performance/concert practice

Department of Computer Science. Final Year Project Report

Computational Modelling of Harmony

7th Grade Beginning Band Music

Chapter 2: Beat, Meter and Rhythm: Simple Meters

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

Music Information Retrieval Using Audio Input

Introduction to capella 8

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

In all creative work melody writing, harmonising a bass part, adding a melody to a given bass part the simplest answers tend to be the best answers.

The Muffin Man. Sheet music in all twelve keys for piano

Music Representations

Middle East Technical University

BUILDING A SYSTEM FOR WRITER IDENTIFICATION ON HANDWRITTEN MUSIC SCORES

Working With Music Notation Packages

Figured Bass and Tonality Recognition Jerome Barthélemy Ircam 1 Place Igor Stravinsky Paris France

SCHEME OF WORK College Aims. Curriculum Aims and Objectives. Assessment Objectives

MUSIC: WESTERN ART MUSIC

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15

Bohunt Worthing Grade Descriptors Subject: Music

Optical Music Recognition: Staffline Detectionand Removal

Sample assessment task. Task details. Content description. Year level 7

Homework Booklet. Name: Date:

Musical Score Checklist Prepared by David Young

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Automatically Creating Biomedical Bibliographic Records from Printed Volumes of Old Indexes

Primitive segmentation in old handwritten music scores

3. Berlioz Harold in Italy: movement III (for Unit 3: Developing Musical Understanding)

EL DORADO UNION HIGH SCHOOL DISTRICT Educational Services. Course of Study Information Page

CSC475 Music Information Retrieval

AP Music Theory Syllabus

Study Guide. Solutions to Selected Exercises. Foundations of Music and Musicianship with CD-ROM. 2nd Edition. David Damschroder

Distortion Analysis Of Tamil Language Characters Recognition

Course Overview. Assessments What are the essential elements and. aptitude and aural acuity? meaning and expression in music?

AUTOMATIC MAPPING OF SCANNED SHEET MUSIC TO AUDIO RECORDINGS

Automatic Rhythmic Notation from Single Voice Audio Sources

BREAKING ACCESSIBILITY BARRIERS Computational Intelligence in Music Processing for Blind People

LESSON 1 PITCH NOTATION AND INTERVALS

Popular Music Theory Syllabus Guide

Course Report Level National 5

Query By Humming: Finding Songs in a Polyphonic Database

Music Similarity and Cover Song Identification: The Case of Jazz

GPS. (Grade Performance Steps) The Road to Musical Success! Band Performance Tasks YEAR 1. Conductor

MANOR ROAD PRIMARY SCHOOL

A Framework for Segmentation of Interview Videos

SmartScore Quick Tour

Computer Coordination With Popular Music: A New Research Agenda 1

Assessment Schedule 2016 Music: Demonstrate knowledge of conventions in a range of music scores (91276)

Elements of Music. How can we tell music from other sounds?

Detecting Musical Key with Supervised Learning

Young Artists Auditions Guidelines 2018

Automatic Music Transcription: The Use of a. Fourier Transform to Analyze Waveform Data. Jake Shankman. Computer Systems Research TJHSST. Dr.

Preface. Ken Davies March 20, 2002 Gautier, Mississippi iii

Symbol Classification Approach for OMR of Square Notation Manuscripts

Level 2 Music, Demonstrate knowledge of conventions in a range of music scores pm Wednesday 28 November 2012 Credits: Four

Smart Traffic Control System Using Image Processing

Chord Classification of an Audio Signal using Artificial Neural Network

Doctor of Philosophy

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR NPTEL ONLINE CERTIFICATION COURSE. On Industrial Automation and Control

Greeley-Evans School District 6 Year One Beginning Orchestra Curriculum Guide Unit: Instrument Care/Assembly

Pitch and Keyboard. Can you think of some examples of pitched sound in music? Can you think some examples of non-pitched sound in music?

MusicHand: A Handwritten Music Recognition System

arxiv: v1 [cs.sd] 8 Jun 2016

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Transcription:

Ph.D Research Proposal: Coordinating Knowledge Within an Optical Music Recognition System J. R. McPherson March, 2001 1 Introduction to Optical Music Recognition Optical Music Recognition (OMR), sometimes also called musical score recognition or simply score recognition, is the process of automatically extracting musical meaning from a printed musical score. Music notation provides a rich description of the composer s ideas, but ultimately sheet music is open to some degree of interpretation by performers. Performance considerations aside, the advantages of a computerised representation of a musical score are numerous. These include: the ability to automatically transpose a particular instrument; converting representation to other musical formats or notations, such as Braille-reading machines, for various software packages, or re-typesetting a score published in an outdated fashion; allowing musicians to read the music from a computer display, for example to eliminate the need for page turns [GWMD96, McP99]; a form of compression, resulting in smaller data sizes [BI98]; ease of sharing and archiving; increased ease of editing (using appropriate software), aiding in composition; and automatic indexing and retrieval of information [MSBW97]. 1.1 General framework for OMR The automated process of extracting musical meaning from sheet music normally follows a number of specialised steps, performed in a fixed order. The first step is to acquire a digital form of the sheet music that a computer can access. Today, this step is fairly easy, with the widespread availability of cheap scanner hardware that can create both colour and monochrome digital images at a resolution of three hundred dots per inch or higher, which is more than adequate for our processing purposes. 1

The second step is to perform various image processing techniques to the acquired image. This is necessary to recognise the symbols that make up the page for example, lines and note heads. This step is the hardest, and is often broken up into two or more separate steps. The final step is to determine the musical meaning (also called musical semantics of the image based on the objects found in the previous step. In CMN for example, objects like notes and rests have musical qualities such as pitch, volume and duration; objects such as slurs, accents and trills affect individuals notes; and objects such as tempo markings, key signatures and time signatures affect the notes that follow. 1.2 Background and Starting Base Common Music Notation (CMN), also called Western staff notation or Western music notation, is the notation most widely used today an example of CMN is shown in Figure 1. Other music notations include guitar tablature, plainsong notation, sacred harp notation, and various Asian, African and Indian musical notations. Figure 1: A sample of Common Music Notation: Handel s Sonata V for flute and piano. Ideally, an OMR system should not be limited to any particular set of symbols. It should be possible to add rules that allow the system to understand a new notation without making significant internal changes to the system. This is referred to as extensibility. Bainbridge s CANTOR system [Bai97] was one of the first fully extensible optical music recognition systems developed. Most prior work was limited to work on small subsets of CMN, and often made assumptions about staff lines, such as there were always five lines per staff. While CANTOR still has the restriction that the music must be stave-based, there can be an arbitrary number of lines per staff. Here, extensible refers to the fact that one of the design goals was to research and design a system that did not have hard-coded shapes built into it. This research led to the formation of Primela a Primitive Expression Language for describing specific musical shapes. A set of Primela descriptions can be written to describe a particular music notation and then loaded and used at run-time, to process an image. 2

CANTOR consists of four main steps: Staff line identification, which locates staffs, removes staff lines and locates objects in the bitmap. Primitive Recognition, which identifies basic shapes, such as (for the CMN Primela descriptions) slurs, noteheads, tails, accidentals, and lines. Primitive Assembly, which joins the basic primitives found into musical objects, such as noteheads, stems and tails into a note; and Musical Semantics, which determines musical qualities such as pitch and duration of the musical objects found, and can output various musical file formats. 2 Areas of Research Most current projects in the field of OMR are concerned with improving the accuracy of the various components, particularly the pattern recognition stages. Instead of focusing solely on the individual components, I wish to research and create methods that improve the overall system not merely by improving components in isolation, but by improving how they interact with each other so as to maximise the amount of musical information gained from the image. Part of my research will involve determining and evaluating appropriate methods for the process controlling the interaction, known as the coordinator. 2.1 Coordinating interaction between components Determining how best to coordinate the information receive from the OMR components will be the main area of focus for the thesis. Figure 2 shows how most current systems operate. The different phases of the OMR system are performed in a linear sequence, and each phase s output becomes the next phase s input. This also means that each phase is tightly coupled to both the previous and following one, as they must share common data structures and formats. Scanned music Staff line identification Image enhancement Musical object location Image enhancement Musical feature classification Musical knowledge Musical encoded data file Musical semantics Figure 2: The current pipeline approach However, this model has some limitations. Most seriously, errors made in an early step will propagate through the following steps. For example, when performing musical semantics analysis on the recognised components, an error may be detected, such as a bar of music not having enough (or too many) 3

notes in it. Because this type of error can not be corrected within the current context, the system is forced to output something that it knows is not quite right. (Some errors, however, such as a missing or mis-detected accidental in a key signature, could conceivably be corrected in this context.) What would improve the system s overall accuracy would be to use this newly-gained context to re-perform a previous stage, and hopefully correct the error given this new information. IMAGE Co-ordinator MUSIC REPRESENTATION Page Layout Musical Semantics/Analysis Staff Processing Primitive Location Primitive Identification Primitive Assembly Figure 3: The proposed coordinated approach Figure 3 shows a possible revised framework to allow feedback to earlier stages. All execution is controlled by a coordinating process the modules can not communicate directly. The idea here is that the top-level process controls the flow of execution, based on a number of variables. Part of the research is to determine the choice of variables used to control program flow, and what affect these variables have on both the performance and the run-time behaviour of the system. This type of framework would also encourage less integration between the various components. Loosely integrated components would allow, for example, the addition of several competing components that are capable of doing the same or similar steps which could have their results compared for discrepancies by the coordinator. This would provide either more confidence that the results are right, if the different components agree, or particular areas that should be further examined if the results conflict. Another advantage is that this framework allows modules that do not directly perform any music processing but still provide additional context. An example of this is a component that could detect the scan quality (perhaps from the level of noise in the bitmap) and if the quality is low then tolerances could be lowered, or a set of descriptions that is specifically designed for noisy data could be used. 2.2 Page Layout I would like to spend some effort into investigating and/or designing algorithms for using a priori knowledge to determine possible object types before using the lower level recognition subsystems such as staff location or character/text recognition. This more general area of research is known as document image analysis, and there are techniques that might be researched and improved with respect to the OMR domain. This could involve the system keeping a history of processed documents, to aid in predicting the layout of future documents, and using prior knowledge to decide that there may be a title and author somewhere 4

near the top of the page. The proposed coordinated approach for the OMR system could then decide whether or not to test this hypothesis given knowledge gained about this area of the page from other sources. 2.3 Classification Algorithms for feature extraction One of the more recent developments in the field of OMR is the use of machinelearning techniques to develop shape descriptions, given a set of training data [Ala95, BAD99, SD98]. These techniques could be investigated to design feature sets for classification of musical primitives for either the current Primela framework, or some new, replacement method for differentiating objects. 2.4 Illustration of the Concept There is currently an existing prototype which is based on the CANTOR code and is work-in-progress capable of using message passing to provide feedback from a particular phase to earlier phases. While not yet very advanced, the following example demonstrates the potential improvement that the methods under investigation may offer. Figure 4(a) shows a small extract from the Clarinet Concerto by Mozart. This extract is from the pianist s part, and also has the clarinetist s part displayed above the piano stave. This incidentally also demonstrates how OMR must be able to deal with symbols at different scales within the same piece. Figures 4(b) and 4(c) show the vertical lines and the flats respectively that were found by CANTOR in the pattern recognition stage. There are some errors in both of these classifications: There are a few mis-identified vertical lines: the time signature ( 6 8 ) was just broken enough to pass as two vertical lines. The musical semantics modules could pick up that there was no time signature yet there were extra vertical lines where a time signature might be expected, and allow the system to reexamine this area. Also, the two letter l s of the word Allegro were not unreasonably determined to be vertical lines, as they were close enough to the staff to be checked. However, they are unlikely to have any musical meaning for CMN, and are also close to other textual characters. There are four naturals in the extract that were determined to be flats, due to the default descriptions used. This could be solved by writing Primela descriptions that correctly differentiate between flats and naturals for the particular fonts used in this piece of music, but it would be more elegant to automatically correct these with semantic analysis, by noticing that accidentals rarely appear that have no affect on the note, due to either the last occurring key signature or from an accidental on the same note earlier in the same bar. Unfortunately, in this particular case there are also missing flats in the key signatures of two of the staves. These could also be picked up using semantic analysis, by noticing that one staff did have a key signature, so the others probably will as well. This, coupled with the fact that there will be unrecognised objects in the position where a key signature could be expected, should provide enough context that the recognition stage should look there again for a key signature. Lastly, for whatever reason the first chord in the second bar did not have a note stem recognised as a vertical line see the circled area within each figure 5

to locate this object. (CANTOR currently checks for vertical lines before checking for accidentals, although this is user-defined in the Primela descriptions.) Because of this, the shape passed the tests as possibly being a flat. This is as far as CANTOR goes. However, when the prototype system assembles the primitives together, it is noticed that this particular flat does not have a notehead in the appropriate position to its immediate right. The primitive assembly module now issues a request to the coordinator to check this primitive s classification again. Note that if the request is rejected, the primitive assembly stage has already been completed, and processing can continue regardless. The coordinator determines that the pattern recognition module is capable of fulfilling this request, so passes the request to it. This stage now takes account of this new context, and subsequently rejects the shape as possibly being a flat (Figure 4(d)). Currently this context (that is, the primitive could not be assembled) is accounted for by re-testing the object for the same classification, but with a higher threshold for passing. While this may seem like a small step, it can have an impact on the final output this is the difference between the music as written, and an incorrect note resulting in a dischord. Unfortunately, the prototype does not yet use this new context to correctly identify this shape, in this case as a vertical line. The prototype system does not currently perform semantic analysis. As the above discussion shows, there are plenty of opportunities to use musical context for improvement in the recognition stages. The key will be finding a generalised approach for this task. 3 Intended Schedule and Requirements This research will be carried out using existing equipment within the department. No extra computing (or other) resources are expected to be required. The following is an estimate of the work likely to be completed. Depending on the progress made during these tasks, other work, such as that mentioned in Sections 2.2 and 2.3, might be undertaken. Also, new developments by other researchers may cause a change in direction or scope for this research. Task Months Continue research, complete first prototype. 6 Experimentation with prototype 2 Write-up methods, ideas and findings. 1 Investigate and create other coordinators 12 Comparisons between coordinators and other OMR systems 3 Completion of write-up 5 Total: 29 Note that some work has previously been done during enrolment for a Masters degree since July 2000. There are currently no foreseen ethical issues arising from this research. If at a later date it is necessary to perform evaluation studies on various methods and/or software, then ethical approval from the school s Ethics Committee will be sought. 6

(a) The starting image (b) The vertical lines found by CANTOR (c) The flats found by CANTOR (d) The flats found by CANTOR with coordination Figure 4: Part of the first line of the Rondo from Mozart s Clarinet Concerto, with area of interest circled 7

References [Ala95] Jarmo T. Alander. Indexed bibliography of genetic algorithms in optics and image processing. Report 94-1-OPTICS, University of Vaasa, Department of Information Technology and Production Economics, 1995. ftp.uwasa.fi/cs/report94-1/gaopticsbib.ps.z. [BAD99] [Bai97] [BI98] Kyungim Baek Bruce A. Draper, Jose Bins. Adore: Adaptive object recognition. In Proceedings of the International Conference on Vision Systems, pages 522 537, Las Palmas de Gran Canaria, Spain, Jan 1999. David Bainbridge. Extensible Optical Music Recognition. PhD thesis, University of Canterbury, Christchurch, New Zealand, 1997. David Bainbridge and Stuart Inglis. Musical image compression. In Proceedings of the IEEE Data Compression Conference, pages 209 218, Snowbird, Utah, 1998. IEEE. [GWMD96] Christopher Graefe, Derek Wahila, Justin Maguire, and Orya Dasna. Designing the muse: A digital music stand for the symphony musician. In Proceedings of the CHI 96 Conference on Human factors in computing systems, page 436, Vancouver, Canada, 1996. ACM. [McP99] J. R. McPherson. Page turning score automation for musicians. B.Sc Honours thesis, University of Canterbury, New Zealand, 1999. [MSBW97] Rodger J. McNab, Lloyd A. Smith, David Bainbridge, and Ian H. Witten. The New Zealand Digital Library MELody index, May 1997. [SD98] Marc Vuilleumier Stückelberg and David Doermann. On musical score recognition using probabilistic reasoning. In Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR 98. IEEE, 1998. 8