Methodologies for Creating Symbolic Early Music Corpora for Musicological Research

Similar documents
METHODOLOGIES FOR CREATING SYMBOLIC CORPORA OF WESTERN MUSIC BEFORE 1600

SIMSSA DB: A Database for Computational Musicological Research

jsymbolic 2: New Developments and Research Opportunities

jsymbolic and ELVIS Cory McKay Marianopolis College Montreal, Canada

CSC475 Music Information Retrieval

Feature-Based Analysis of Haydn String Quartets

arxiv: v1 [cs.sd] 8 Jun 2016

Jazz Melody Generation and Recognition

Music and Text: Integrating Scholarly Literature into Music Data

Style-independent computer-assisted exploratory analysis of large music collections

Building a Better Bach with Markov Chains

Music Information Retrieval

From RTM-notation to ENP-score-notation

Sample assessment task. Task details. Content description. Year level 7

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

Introductions to Music Information Retrieval

STRING QUARTET CLASSIFICATION WITH MONOPHONIC MODELS

Quality Control Experiences from a Large-Scale Film Digitisation Project

2013 Assessment Report. Music Level 1

Composer Style Attribution

Take a Break, Bach! Let Machine Learning Harmonize That Chorale For You. Chris Lewis Stanford University

Analysis and Clustering of Musical Compositions using Melody-based Features

Sample assessment task. Task details. Content description. Year level 9

Assessment Schedule 2017 Music: Demonstrate knowledge of conventions used in music scores (91094)

COURSE SYLLABUS. Course #: X Course Title: Workshop in Music Notation. Reg. # V4475 Units: 4. Quarter/Yr: Winter Quarter 2010

Measuring a Measure: Absolute Time as a Factor in Meter Classification for Pop/Rock Music

Music Performance Panel: NICI / MMM Position Statement

Effects of acoustic degradations on cover song recognition

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

Computational Modelling of Harmony

Curriculum Catalog

APPENDIX A: ERRATA TO SCORES OF THE PLAYER PIANO STUDIES

Pitfalls and Windfalls in Corpus Studies of Pop/Rock Music

Texas State Solo & Ensemble Contest. May 25 & May 27, Theory Test Cover Sheet

Automatic Rhythmic Notation from Single Voice Audio Sources

Section 1 The Portfolio

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

MUSICAL STRUCTURAL ANALYSIS DATABASE BASED ON GTTM

Hidden Markov Model based dance recognition

Centre for Economic Policy Research

PKUES Grade 10 Music Pre-IB Curriculum Outline. (adapted from IB Music SL)

An editor for lute tablature

Popular Music Theory Syllabus Guide

Exploring the Design Space of Symbolic Music Genre Classification Using Data Mining Techniques Ortiz-Arroyo, Daniel; Kofod, Christian

Sample assessment task. Task details. Content description. Task preparation. Year level 9

Score Printing and Layout

Tool-based Identification of Melodic Patterns in MusicXML Documents

How to Read Just Enough Music Notation. to Get by in Pop Music

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Credo Theory of Music training programme GRADE 4 By S. J. Cloete

THE importance of music content analysis for musical

Automatic Music Clustering using Audio Attributes

ELVIS. Electronic Locator of Vertical Interval Successions The First Large Data-Driven Research Project on Musical Style Julie Cumming

METHOD TO DETECT GTTM LOCAL GROUPING BOUNDARIES BASED ON CLUSTERING AND STATISTICAL LEARNING

Student Performance Q&A: 2001 AP Music Theory Free-Response Questions

How to Describe a Sound Trademark in an Application (in the form of a staff)

Detecting Musical Key with Supervised Learning

Chord Classification of an Audio Signal using Artificial Neural Network

Lesson Week: August 17-19, 2016 Grade Level: 11 th & 12 th Subject: Advanced Placement Music Theory Prepared by: Aaron Williams Overview & Purpose:

Formative Assessment Plan

Introduction to Performance Fundamentals

Breakscience. Technological and Musicological Research in Hardcore, Jungle, and Drum & Bass

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

A Meta-Theoretical Basis for Design Theory. Dr. Terence Love We-B Centre School of Management Information Systems Edith Cowan University

OLCHS Rhythm Guide. Time and Meter. Time Signature. Measures and barlines

AUTOMATIC MAPPING OF SCANNED SHEET MUSIC TO AUDIO RECORDINGS

Unit 1. π π π π π π. 0 π π π π π π π π π. . 0 ð Š ² ² / Melody 1A. Melodic Dictation: Scalewise (Conjunct Diatonic) Melodies

UNIVERSITY OF WATERLOO Music Department- Conrad Grebel University College Music Fundamentals of Music Theory FALL 2013

Distributed Digital Music Archives and Libraries (DDMAL)

LESSON 1 PITCH NOTATION AND INTERVALS

Evaluating Melodic Encodings for Use in Cover Song Identification

Generating Music with Recurrent Neural Networks

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Representing, comparing and evaluating of music files

Outline. Why do we classify? Audio Classification

TOWARDS STRUCTURAL ALIGNMENT OF FOLK SONGS

OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS

Singer Recognition and Modeling Singer Error

Elementary Strings Grade 5

MUSIC THEORY CURRICULUM STANDARDS GRADES Students will sing, alone and with others, a varied repertoire of music.

AP MUSIC THEORY 2016 SCORING GUIDELINES

Introduction to capella 8

AP Music Theory 2015 Free-Response Questions

Audio Feature Extraction for Corpus Analysis

Music Genre Classification and Variance Comparison on Number of Genres

Perception: A Perspective from Musical Theory

INTERACTIVE GTTM ANALYZER

Frescobaldi (?): Three Toccatas

Manuscript writing and editorial process. The case of JAN

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

MUSIC: PAPER II. 2. All questions must be answered on the question paper. Do not answer any questions in an answer booklet.

CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER 9...

Keys: identifying 'DO' Letter names can be determined using "Face" or "AceG"

Timing In Expressive Performance

Pitch Spelling Algorithms

EVALUATING THE GENRE CLASSIFICATION PERFORMANCE OF LYRICAL FEATURES RELATIVE TO AUDIO, SYMBOLIC AND CULTURAL FEATURES

QT Measurements on-screen Methods

GRAPH-BASED RHYTHM INTERPRETATION

Music Alignment and Applications. Introduction

EXPRESSIVE NOTATION PACKAGE - AN OVERVIEW

Transcription:

Methodologies for Creating Symbolic Early Music Corpora for Musicological Research Cory McKay (Marianopolis College) Julie Cumming (McGill University) Jonathan Stuchbery (McGill University) Ichiro Fujinaga (McGill University) With lots of help from Nathaniel Condit-Schultz, Néstor Nápoles López and Ian Lorenz

2 / 22 Motivation Scores are increasingly being made available in machine-readable symbolic formats Music XML, MEI, MIDI, Sibelius, Finale, etc. Software is increasingly used to carry out studies spanning hundreds of pieces (or more) jsymbolic, music21, Humdrum, MIDI Toolbox, etc. Naïve approaches to constructing corpora can limit or bias studies performed on them Can lead to erroneous results and conclusions Worse, these problems may not be apparent to those conducting the studies

3 / 22 Goals of this work Propose a robust methodology for creating early music computational research corpora Identification of pitfalls Creation of a model workflow and templates Create a sample corpus using this methodology Duos from Josquin and La Rue Masses Perform experiments to validate and learn from the sample corpus Using jsymbolic features, statistical analysis and machine learning

4 / 22 Big problem areas Interpreting the original notation Many ways to represent and interpret early music in modern notation Essential to have all works in the corpus transcribed using a consistent methodology Encoding the music in a computerreadable file Inconsistent encoding can result in unexpected consequences Especially when machine learning is used

Problems with inconsistency and incompleteness Computers will be confused if different encoders adopt different standards or make different assumptions Computers will interpret these subjective differences as real differences intrinsic to the music Data to be processed by a computer should explicitly specify all necessary information Cannot expect computers to have the same implicit musical knowledge human experts do Many automated algorithms require that information be complete and unambiguous If these decisions are not made explicit in encodings, then algorithms may make their own inappropriate assumptions, or may be unable to process the music at all 5 / 22

6 / 22 Sample interpretation problems (1/2) Editors sometimes transpose works to different keys When arranging for specific ensembles Because they believe that the original proper pitch was higher or lower than specified in the source Performers can be expected to add accidentals without explicit instructions in the score e.g. music ficta Different performers may make different decisions

7 / 22 Sample interpretation problems (2/2) Mensuration signs indicate metrical organization But are not quite the same as time signatures And original parts have no barlines, ties are never used Some editions use barlines, some do not Note values are larger than those of common Western notation The beat generally falls on the semibreve (whole note) Different editions may use the original, halved, quartered or smaller note values

8 / 22

9 / 22 Overview of our approach (1/2) Use modern notation In order to permit the use of established computational tools that can only process modern notation Make as few editorial decisions as possible Encoders thus avoid imposing their subjective interpretations on others e.g. do not add accidentals not specified in the source If a given researcher wishes to add accidentals in a particular way, they can reprocess the files to be consistent in the way they feel is best

10 / 22 Overview of our approach (2/2) If an editorial decision must be made, be unwaveringly consistent e.g. use barlines and time signatures, as required by modern notation, but always use the whole note as the beat if this is what is in the source If an editorial decision must be made, document it precisely and completely And distribute the resultant workflow with the corpus Those using the corpus will then be made explicitly aware of what decisions were made And can reprocess the corpus to incorporate different editorial decisions if they wish

11 / 22 Sample encoding problems (1/2) Some encoding formats do not allow all information of interest to be encoded e.g. MIDI cannot distinguish between a C# and a Db Any given piece of analysis software will only be compatible with a limited number of encoding formats But one wants researchers to be able to use the software of the choice MIDI is by far the closest thing to a universal format But MIDI is a deeply flawed format

12 / 22 Sample encoding problems (2/2) Encoding software may make editorial decisions of its own, especially under default settings These can vary across software packages Or even across different versions of the same software e.g. Finale and Sibelius may incorporate rubato into saved files if not explicitly told to quantize rhythm Unless care is taken, the encoding software may do this without the knowledge of the encoders operating it

Overview of our encoding approach (1/3) Create a detailed workflow and follow it Without exception! Use precisely the same software for all encodings (Sibelius) Under the same operating system and settings Use pre-constructed templates To maximize consistency and avoid human error Use automated scripts To speed the process up e.g. ManuScript, the Sibelius scripting language 13 / 22

Overview of our encoding approach (2/3) 14 / 22 Avoid encoding methodologies that throw out information (when possible) Follow consistent labelling standards e.g. if a piece is to be played by viola, always label it exclusively as viola, not as a mix of viola and alto, for example Encode provenance in the files In case a file becomes separated from its encapsulating dataset

Overview of our encoding approach (3/3) Publish the corpus using multiple different file formats e.g. MIDI, Music XML, Sibelius, etc. Be sure to include MIDI as one of these because of its universality (and despite its flaws) Offers researchers choice Generate all versions from a single original master file Verify all final files Manually Labour intensive, but necessary to avoid unforeseen problems (of which there can be many) Automatically To detect things that were missed manually 15 / 22

16 / 22 Our corpus (1/3) Duos (surrounded by double bars) from Masses composed by two contemporaries: Josquin Desprez 33 Duos from 11 secure Masses c. 1450-55 to 1521 Varied career in France and Italy Pierre de la Rue 44 Duos from 26 secure Masses c. 1452 to 1518 Hapsburg-Burgundian chapel, Low Countries and Spain Meconi, Grove: Despite differences in style, La Rue s music was probably most strongly influenced by that of Josquin. There are curious parallels between the works of the two.

17 / 22 Our corpus (2/3) Began with Music XML masses downloaded from the Josquin Research Project (JRP) Used Sibelius to extract the duos Added additional duos by transcribing them directly using Sibelius Processed, cleaned and verified all duos from all sources using the workflow described earlier e.g. restoring original note values To ensure consistency, among other things

18 / 22 Our corpus (3/3) Final version will be posted publicly once the paper is accepted Including Sibelius, Music XML, MIDI, MEI and PDF versions of the Duos Including the detailed workflow and templates

19 / 22 Experiments We conducted a series of experiments with our Duos corpus To quantitatively explore the effects of using different encoding methodologies Trained machine learning models to distinguish the Josquin Duos from the La Rue Duos Used three different version of the corpus, encoded different ways I will only summarize the results here Detailed results and analysis are available in the written paper...

20 / 22 Experimental conclusions The cleaned, consistent version of the dataset produced better results than the original files before cleaning Because inconsistent encoding practices create obscuring noise Combining Josquin pieces consistently encoded one way with La Rue pieces consistently encoded another way resulted in grossly inflated performance Because the system cheated by basing its classifications on encoding practice rather than the underlying music An important warning not to blindly combine data from different sources

21 / 22 Conclusions and contributions Provided a set of principles and workflow for constructing proper early music research corpora Constructed a sample corpus of Duos from Masses using this workflow Showed experimentally that using consistently and systematically encoded music produces better and safer results

Thanks for your attention E-mail: julie.cumming@mcgill.ca E-mail: cory.mckay@mail.mcgill.ca