SIMSSA DB: A Database for Computational Musicological Research

Similar documents
jsymbolic 2: New Developments and Research Opportunities

Methodologies for Creating Symbolic Early Music Corpora for Musicological Research

jsymbolic and ELVIS Cory McKay Marianopolis College Montreal, Canada

Music Information Retrieval

CSC475 Music Information Retrieval

METHODOLOGIES FOR CREATING SYMBOLIC CORPORA OF WESTERN MUSIC BEFORE 1600

arxiv: v1 [cs.sd] 8 Jun 2016

Music and Text: Integrating Scholarly Literature into Music Data

Tool-based Identification of Melodic Patterns in MusicXML Documents

Style-independent computer-assisted exploratory analysis of large music collections

ETHNOMUSE: ARCHIVING FOLK MUSIC AND DANCE CULTURE

Audio Feature Extraction for Corpus Analysis

Distributed Digital Music Archives and Libraries (DDMAL)

MUSICAL STRUCTURAL ANALYSIS DATABASE BASED ON GTTM

Representing, comparing and evaluating of music files

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

METHOD TO DETECT GTTM LOCAL GROUPING BOUNDARIES BASED ON CLUSTERING AND STATISTICAL LEARNING

ANNOTATING MUSICAL SCORES IN ENP

Non-chord Tone Identification

RDA RESOURCE DESCRIPTION AND ACCESS

The well-tempered catalogue The new RDA Toolkit and music resources

Introductions to Music Information Retrieval

CPU Bach: An Automatic Chorale Harmonization System

A linked research network that is Transforming Musicology

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

LSTM Neural Style Transfer in Music Using Computational Musicology

CS229 Project Report Polyphonic Piano Transcription

CS 591 S1 Computational Audio

To: Joint Steering Committee for Development of RDA. From: Damian Iseminger, Chair, JSC Music Working Group

Towards the tangible: microtonal scale exploration in Central-African music

Outline. Why do we classify? Audio Classification

A Basis for Characterizing Musical Genres

AACR2 versus RDA. Presentation given at the CLA Pre-Conference Session From Rules to Entities: Cataloguing with RDA May 29, 2009.

Music Genre Classification and Variance Comparison on Number of Genres

The purpose of this essay is to impart a basic vocabulary that you and your fellow

MIR IN ENP RULE-BASED MUSIC INFORMATION RETRIEVAL FROM SYMBOLIC MUSIC NOTATION

MUSI-6201 Computational Music Analysis

MUSICAL MOODS: A MASS PARTICIPATION EXPERIMENT FOR AFFECTIVE CLASSIFICATION OF MUSIC

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

Figured Bass and Tonality Recognition Jerome Barthélemy Ircam 1 Place Igor Stravinsky Paris France

WORLD LIBRARY AND INFORMATION CONGRESS: 75TH IFLA GENERAL CONFERENCE AND COUNCIL

Music Information Retrieval. Juan P Bello

A MANUAL ANNOTATION METHOD FOR MELODIC SIMILARITY AND THE STUDY OF MELODY FEATURE SETS

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

A DISCRETE FILTER BANK APPROACH TO AUDIO TO SCORE MATCHING FOR POLYPHONIC MUSIC

Music Similarity and Cover Song Identification: The Case of Jazz

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Doctor of Philosophy

Ask a Librarian: The Role of Librarians in the Music Information Retrieval Community

MUSICOLOGY OF EARLY MUSIC WITH EUROPEANA TOOLS AND SERVICES

Date submitted: 5 November 2012

STRING QUARTET CLASSIFICATION WITH MONOPHONIC MODELS

Indexing local features. Wed March 30 Prof. Kristen Grauman UT-Austin

Singer Traits Identification using Deep Neural Network

Building a Better Bach with Markov Chains

The Human Features of Music.

Musical Harmonization with Constraints: A Survey. Overview. Computers and Music. Tonal Music

Enhancing Music Maps

Spartan-II Development System

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

The MAMI Query-By-Voice Experiment Collecting and annotating vocal queries for music information retrieval

ELVIS. Electronic Locator of Vertical Interval Successions The First Large Data-Driven Research Project on Musical Style Julie Cumming

ATOMIC NOTATION AND MELODIC SIMILARITY

arxiv: v1 [cs.lg] 15 Jun 2016

Melody classification using patterns

INTERACTIVE GTTM ANALYZER

Metadata for Enhanced Electronic Program Guides

Cooperative Cataloging in Academic Libraries: From Mesopotamia to Metadata

Automatic Music Clustering using Audio Attributes

The Biblissima Portal

Feature-Based Analysis of Haydn String Quartets

Evaluation of Melody Similarity Measures

The Joint Transportation Research Program & Purdue Library Publishing Services

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Modelling Intellectual Processes: The FRBR - CRM Harmonization. Authors: Martin Doerr and Patrick LeBoeuf

Where to present your results. V4 Seminars for Young Scientists on Publishing Techniques in the Field of Engineering Science

Computational Parsing of Melody (CPM): Interface Enhancing the Creative Process during the Production of Music

How to Choose the Right Journal? Navigating today s Scientific Publishing Environment

CALCULATING SIMILARITY OF FOLK SONG VARIANTS WITH MELODY-BASED FEATURES

CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER CHAPTER 9...

On time: the influence of tempo, structure and style on the timing of grace notes in skilled musical performance

Chords not required: Incorporating horizontal and vertical aspects independently in a computer improvisation algorithm

Aggregating Digital Resources for Musicology

A geometrical distance measure for determining the similarity of musical harmony. W. Bas de Haas, Frans Wiering & Remco C.

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Report on the ISME - Gibson Award 2012 to the Greek Society for Music Education and the Music Library of Greece Lillian Voudouri

ROLE OF FUNCTIONAL REQUIREMENTS FOR BIBLIOGRAPHIC RECORDS IN DIGITAL LIBRARY SYSTEM

A repetition-based framework for lyric alignment in popular songs

WHITEPAPER. Customer Insights: A European Pay-TV Operator s Transition to Test Automation

Audio. Meinard Müller. Beethoven, Bach, and Billions of Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

MTO 22.1 Examples: Carter-Ényì, Contour Recursion and Auto-Segmentation

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music

Open Research Online The Open University s repository of research publications and other research outputs

Susan K. Reilly LIBER The Hague, Netherlands

Music Information Retrieval

Assignment 2: MIR Systems

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

FUNDAMENTALS OF MUSIC ONLINE

Transcription:

SIMSSA DB: A Database for Computational Musicological Research Cory McKay Marianopolis College 2018 International Association of Music Libraries, Archives and Documentation Centres International Congress, SIMSSA Workshop, Leipzig, Germany

2 / 28 Topics Currently available musicological research databases and repositories Data needs of computational musicology and MIR The SIMSSA DB Features and jsymbolic Archiving research Design priorities Data model Prototype interface

3 / 28 Existing music research databases There are several excellent on-line databases available that provide researchers with access to: Musical metadata e.g. Bach Digital Images of scores and manuscripts e.g. Musiclibs Audio recordings e.g. Naxos Digital (paid service)

4 / 28 Symbolic music repositories (1/2) However, there are relatively few research-grade online repositories of symbolic music i.e. Finale, Sibelius, Music XML, MEI, MIDI, etc. files Most symbolic music repositories that do exist tend to either: Have unreliable data and metadata (intended for nonspecialist use rather than rigorous musicological research) e.g. Classical Archives or Musescore Be limited in scope e.g. the SEILS dataset Have relatively limited metadata structuring and only basic search functionality e.g. Kern Scores

5 / 28 Symbolic music repositories (2/2) Those few research-grade symbolic music repositories that do exist are used heavily by musicologists and MIR researchers e.g. the Josquin Research Project This makes it clear how much such resources are needed by the research community

6 / 28 Computational musicology and MIR Automated data extraction software, statistical analysis techniques and machine learning now allow us to: Study huge quantities of music very quickly More than any human could reasonably look at Empirically validate (or repudiate) our theoretical predictions Do purely exploratory studies of music See music from fresh perspectives

7 / 28 We need symbolic data But to take full advantage of these techniques, researchers need symbolic music files Lots of symbolic music files Varied symbolic music files High-quality and symbolic music files Consistently encoded symbolic music files So where can researchers get these? <pause type= dramatic >1 sec</pause>...

8 / 28 Introducing the SIMSSA DB Emphasizes research-grade symbolic music files Permits flexible, high-quality searchable metadata Of the kinds specifically needed by musicologists and MIR researchers Allows modelling of complex relationships Provenance is given particular centrality Allows records to be kept of the specific files (and other related information) used in individual research studies Permits content-based (as well as metadatabased) search and analysis Let s expand on this for a moment...

9 / 28 The notion of a feature A feature is a piece of information that characterizes something (e.g. a piece of music) in a simple way Usually a simple numerical value A feature can be a single value, or it can be a set of related values (e.g. a histogram) Can be extracted from pieces in their entirety, or from segments of pieces Can use features to compare and look for patterns in different music in a macro sense

10 / 28 Example: A basic feature Range (1-D): Difference in semitones between the highest and lowest pitches Value of this feature: 7 G - C = 7 semitones In practice, of course, we want many features, not just one...

11 / 28 jsymbolic (1/2) jsymbolic is our software platform for automatically extracting features from symbolic music (ISMIR 2018) Extracts 246 unique features (version 2.2) Some of these are multi-dimensional, including histograms Extracts a total of 1497 separate values (version 2.2) per symbolic music file

12 / 28 jsymbolic (2/2) Types of information accessed by jsymbolic features: Pitch statistics Melody / horizontal intervals Chords / vertical intervals Texture Rhythm Instrumentation Dynamics

13 / 28 SIMSSA DB and jsymbolic features jsymbolic is being integrated into the SIMSSA DB Whenever a file is added to the DB, features are automatically extracted and used to index the file Users can use these features to search the DB based on musical content as well as metadata e.g. retrieve all pieces composed by J. S. Bach in Leipzig that contain vertical tritones or parallel fifths Researchers can also download and use features directly as input to statistical analysis and machine learning tools (or use manual analysis) to study things such as: Composer attribution (MedRen 2017, ISMIR 2017) Genre (MedRen 2018, ISMIR 2010) Regional styles (APM 2018)

14 / 28 Archiving research Researchers can submit information on particular studies they performed Specifically which symbolic music files were used Specifically which features (if any) were used Workflows, results, analysis, conclusions, publications and other related data Essential for repeatability, direct comparison of approaches, iterative refinements, etc. jsymbolic configuration files can be autogenerated for each study in order to facilitate this

15 / 28 Design priorities (1/8) Make the repository as accessible as possible to all music researchers, regardless of technological training As users As data (and metadata) contributors As editors / validators This requires a front-end that is easy-touse And that hides details of the data model from users that they do not need to be aware of

16 / 28 Design priorities (2/8) Use authority control and cataloguing standards to reduce ambiguity and redundancy (and increase consistency) as much as possible Initial focus on VIAF authority files, but also looking at: FRBR Wikidata RISM s Muscat and authority files RDA Library of Congress Populate fields with URIs and use linked open data practices when possible But also allow contributors to enter raw text into fields (to meet the realistic needs of and constraints faced by musicologists)

17 / 28 Design priorities (3/8) Information relating to quality control and file encoding methodology must be kept Who submitted data or metadata Who verified or edited data or metadata Who (or what software) encoded a symbolic music file, and using what settings Encoding methodologies can significantly influence results if one is not careful (ISMIR 2018)

18 / 28 Design priorities (4/8) Keeping a record of provenance is musicologically essential Each symbolic music file is linked to a specific source (digital or physical) Each source can be linked to its parent source(s) through chains of provenance e.g. an MEI file is derived from a printed score J. S. Bach score, which is derived from a handwritten copyist s manuscript, which is derived from a (potentially lost) original manuscript handwritten by Bach

19 / 28 Design priorities (5/8) Maintain a conceptual separation between abstract musical works and particular instantiations of them (as expressed by symbolic files and sources) Multiple versions of the same abstract work can exist, and these should be both associated with and differentiated from one another e.g. different symbolic encodings e.g. different editions, arrangements, etc. of a work

20 / 28 Design priorities (6/8) Make it possible to divide abstract musical works into abstract sections and parts Symbolic files sometimes contain whole pieces, and sometimes only parts of pieces Make it possible to keep track of complex relationships between works, sections and parts e.g. a movement of one mass might be reused in another mass e.g. an orchestral score and a piano reduction of it have different parts, but they are the same work and have the same sections

21 / 28 Design priorities (7/8) Make it possible to link an abstract musical work (and its sections and parts) to instantiations in multiple formats Symbolic music files Musical texts Images of scores or manuscripts Audio files Although our primary focus is on symbolic music, this data is ultimately all related...

22 / 28 Design priorities (8/8) Long-term, we want to: Link our data to the contents of other repositories e.g. DOREMUS, Josquin Research Project, etc. We are putting a design emphasis on making it possible to import or export information using linked open data frameworks IIIF-compatibility will certainly help with respect to images Take as input symbolic files auto-generated from images using OMR As the technology improves Take as input symbolic files auto-generated from audio files using automatic transcription algorithms As the technology improves

Overview ERD of our data model 23 / 28

Prototype interface (1/3) 24 / 28

Prototype interface (2/3) 25 / 28

Prototype interface (3/3) 26 / 28

27 / 28 Highlights of the SIMSSA DB Designed to meet the specific needs of researchers wishing to engage in large-scale computational musicological and MIR research Focus on symbolic music files But also permits links with images, audio files and texts Emphasis on accessibility to researchers Emphasis on quality and consistency of both metadata and data Authority control and cataloguing standards Modeling of complex musical relationships Relationships between (abstract) works, sections and parts Mapping musical instantiations (e.g. files) to abstract musical entities Emphasis on provenance Archiving of experiments Content-based search and analysis based on features As well as metadata-based searches, of course

Thanks for your attention E-mail: cory.mckay@mail.mcgill.ca The SIMSSA DB team: Julie E. Cumming, Ichiro Fujinaga, Andrew Hankinson, Emily Hopkins, Yaolong Ju, Andrew Kam, Gustavo Polins Pedro