Linking subject labels in Cultural Heritage Metadata to MIMO vocabulary using CultuurLink

Similar documents
Sur des approches d alignemement semi automatique

The well-tempered catalogue The new RDA Toolkit and music resources

NAMING AND REGISTRATION OF IOT DEVICES USING SEMANTIC WEB TECHNOLOGY

Thematic Collections on Europeana: a one-stop-shop for storytellers

Susan K. Reilly LIBER The Hague, Netherlands

SIMSSA DB: A Database for Computational Musicological Research

Thema: the subject category scheme for a global book trade

Europeana DCHE. 11 May 2017 Jill Cousins, Harry Verwayen, Shadi Ardalan

Thema: the subject category scheme for a global book trade

The Biblissima Portal

Today s WorldCat: New Uses, New Data

A Gateway to Film Heritage in Europe

Off campus access: If you are off campus when you click on PsycINFO you will be asked to log in with a library barcode and PIN number.

Europeana Foundation Governing Board Meeting

A Gateway to Film Heritage in Europe

1. Controlled Vocabularies in Context

Before EFG: MIDAS. A Gateway to Film Heritage in Europe. Il Cinema Ritrovato Bologna 4 July 2009

Aggregating Digital Resources for Musicology

ARCHIVAL DESCRIPTION GOOD, BETTER, BEST

Szymanowska Scholarship: Ideas for Access and Discovery through Collaborative Efforts 1

LIBER Road Map towards Digitisation

Best Practices for Using LCGFT for Music Resources

How comprehensive is the PubMed Central Open Access full-text database?

The Electroacoustic Music Studies Asia Network (EMSAN)

ENCYCLOPEDIA DATABASE

Signatures of All Things I am Here to Read : Digital Research as Practice, Digital Networks as Public Engagement

More than a feeling: I see my MARC life walking away. Eric Childress Consulting Project Manager OCLC Research

Digital Information Services

ICOMOS Ename Charter for the Interpretation of Cultural Heritage Sites

Discovery has become a library buzzword, but it refers to a traditional concept: enabling users to find library information and materials.

ICOMOS ENAME CHARTER

BEREC Opinion on. Phase II investigation. pursuant to Article 7 of Directive 2002/21/EC as amended by Directive 2009/140/EC: Case AT/2017/2020

UA Libraries; UW-Madison Libraries; IMLS: Advisory Committee; Program Manager; Support Staff

Scopus. Advanced research tips and tricks. Massimiliano Bearzot Customer Consultant Elsevier

Retired. 1. Power On/Off Button 4.Minus (-) Button 2. Power Indicator LED 5.Menu Select Button 3. Plus (+) Button 6.

Visual Arts Curriculum Framework

A Semantic Model for Historical Manuscripts

Testing the Extracting Metadata for Preservation project's Named Entity Recognizer on metadata

Cataloging Principles: IME ICC

Advanced Placement English Language and Composition

Defining National Solutions for Managing Book Collections and Improving Digital Access

CLARIN - NL. Language Resources and Technology Infrastructure for the Humanities in the Netherlands. Jan Odijk NO-CLARIN Meeting Oslo 18 June 2010

Usage of provenance : A Tower of Babel Towards a concept map Position paper for the Life Cycle Seminar, Mountain View, July 10, 2006

Música a la llum : the Access to Music Archives IAML project adapted to the wind bands of the region of Valencia

Configuring Ex Libris Primo for JSTOR: A Quick Reference Guide

Metadata for Enhanced Electronic Program Guides

Music. educators feedback

Audio & Music multiplatform compliance guidelines

Multilingual timespan expressions

Laurent Romary. To cite this version: HAL Id: hal

The Ontological Character of Classes in the Dewey Decimal Classification. Rebecca Green Michael Panzer OCLC Online Computer Library Center, Inc.

RDA RESOURCE DESCRIPTION AND ACCESS

Good afternoon! Our topic is book collecting contests and the impact that the digital age may or may not be having on them. [did a bit of explaining

MUSIC APPRECIATION MUS 1030

LIBRARY AND INFORMATION SERVICES COLLECTION DEVELOPMENT GUIDELINES FOR SPECIAL COLLECTIONS

ITU-T Y Specific requirements and capabilities of the Internet of things for big data

Hello, I m Karen Sayers from Special Collections at the University of Leeds

ICOMOS Charter for the Interpretation and Presentation of Cultural Heritage Sites

Climbing the Tower of Babel Challenges and Opportunities in Multilingual Data for the Digital Humanities

Allen ISD Bundled Curriculum Document. Grade level Time Allotted: Days Content Area Fine Arts-Technical Theatre II Unit Name:

22-27 August 2004 Buenos Aires, Argentina

Edna Patterson- Petty, Quilts

2013 Assessment Report. Music Level 1

Cataloguing guidelines for community archives

NDL s Digital Collection and Service for Information Access

A Gateway to Film Heritage in Europe Archimages09 18 November 2009 Paris

Texas Woman s University

Digitised Content: How we Make It Relevant to Researchers, Teachers and Students

GCSE. Music. CCEA GCSE Specimen Assessment Materials for

POLAND AND EUROPEANA. An overview. 19 June Łaski s Statute Giovanni RossoBibNational Museum in Krakow CC BY

Q1: Are you a member of a professional association of literary translators? (187 responses) - Yes 93.6% - No 6.4%

Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web

The editorial process for linguistics journals: Survey results

LIDO at the Yale Center for British Art From data exchange and scholarly cataloging to Linked Open Data

CASE STUDY NORDISK FILM KINO S MARKETING HITS HIGH NOTES

ICOMOS ENAME CHARTER

WEB OF SCIENCE THE NEXT GENERATAION. Emma Dennis Account Manager Nordics

EFG1914: FINAL PUBLIC PROGRESS REPORT

Library Liaison Advisory Group Fall Quarter Meeting Minutes Tuesday, October 14, 2008 Tuesday, November 11, 2008 Thursday, November 20, 2008

Europeana Core Service Platform

Bibliometric Study on LIS Journals Archived in DOAJ

LIBRARY SKILLS MIDTERM. 1. Review the first five units. Read the review material for the midterm.

BIC Standard Subject Categories an Overview November 2010

Conducting a successful literature search: A researcher s guide to tools, terms and techniques

Un-hiding Strategies for outreach in times of austerity

VPL-HW45ES Home Theater Projector

Maurits van der Graaf Pleiade Management & Consultancy

Thema The new international subject category standard for books and e-books

DIATHEMATIKON PROGRAMMA CROSS-THEMATIC CURRICULUM FRAMEWORK

Title: Documentation for whom?

Design Principles and Practices. Cassini Nazir, Clinical Assistant Professor Office hours Wednesdays, 3-5:30 p.m. in ATEC 1.

Springer Archives ABC. Unlock Yesterday s Minds Today. springer.com. Springer Book Archives and Springer Journal Archives. springer.

Oral history for library history

Library Research Unit Exercises: English Composition I (Rev. 9-19)

CERL at a Glance. Marian Lefferts. CERL meetings, NL Oslo, October 2014

Project I- Care Children, art, relationship and education. Summary document of the training methodologies

STUDENT: TEACHER: DATE: 2.5

Information Standards Quarterly

Authority data in a European context: The CERL Thesaurus

A Gateway to Film Heritage in Europe BAAC & LCSA Annual Conference 5 October 2009 Vilnius

Transcription:

Linking subject labels in Cultural Heritage Hugo Manguinhas, Valentine Charles, Antoine Isaac Europeana Foundation Tom Miles The British Library Aude Lima The Centre de Recherche en Ethnomusicologie Ariane Néroulidis, Véronique Ginouvès The Maison Méditerranéenne des Sciences de l'homme Dimitra Atsidis, Maarten Brinkerink Netherlands Institute for Sound and Vision Michiel Hildebrand Spinque B.V. Sergiu Gordea Austrian Institute of Technology

What is Europeana? The Platform for Europe s Digital Cultural Heritage We aggregate metadata: From all EU countries ~3,500 galleries, libraries, archives and museums More than 52M objects In about 50 languages Huge amount of references to places, agents, concepts, time Europeana aggregation infrastructure Europeana

The Europeana Sounds project Europeana Sounds aims to increase the amount of audio content available via Europeana also improving geographical and thematic coverage Apart from aggregation, it improves discovery and use of audio content, by enriching metadata through innovative methods

The scope of the experiment Evaluate the use of a semi-automatic tool like CultuurLink for a concrete vocabulary alignment case, and Assess the coverage of the MIMO vocabulary for enriching Europeana Sounds datasets

About the MIMO vocabulary A multilingual controlled vocabulary of musical instruments Developed within the Musical Instruments Museums Online project which gathered some of Europe's most important musical instruments museums

Why MIMO? A significant part of the subjects present in Europeana Sounds collections refer to musical instruments Good coverage gathers a total of 3121 musical instruments used by professionals such as Hornbostel-Sachs (641) contains terms in 8 different languages (English, French, Polish, Catalan, Dutch, Italian, Swedish, German) Technically available on the Web Follows the Linked Data best practices and recipes (RDF, SKOS, content negotiation) Openly available (CCO)

Overview of MIMO language coverage

What is CultuurLink? Semi-automatic Vocabulary Alignment Tool Successor to EuropeanaConnect's Amalgame http://cultuurlink.beeldengeluid.nl

Why CultuurLink? Freely available as an online open service that any user can use Users have the ability to design and experiment with different alignment strategies helps the task of discovering new alignments between two vocabularies users can define and combine strategies that apply different techniques or parameterizations Manual control alignments are identified through an automatic means but strategies are designed by users users can decide which alignments are correct and can assign a specific meaning (e.g. skos:exactmatch, skos:related, skos:broadmatch) User friendly allows non-technical savvy users to easily perform fairly complex tasks

The participants and their collections The British Library (BL) participated with 3 collections: A selection of Asian instruments (1,099 records) from the "Colin Huehns Asia Collection" a selection from the Peter Cooke Uganda Collection (1,312 records) and the Keith Summers English Folk Music Collection (1,326 records) The Centre de Recherche en Ethnomusicologie (CREM) participated with a test collection of 36 records published in the CD Musical Instruments of the World The Maison Méditerranéenne des Sciences de l'homme (MMSH) participated with a collection of 25 records about folk music The Netherlands Institute of Sound and Vision (NISV) participated with a collection of 6,608 records containing commercial 78 rpm records (Handelsplaten) from different genres like light music, classical music and opera.

Alignment of vocabulary terms We decided to focus on the vocabulary terms within the subject fields of the metadata as opposed to aligning the full vocabulary used by the providing institution, because: not available for use outside the organization and/or in a data structure that suits a vocabulary alignment tool we preferred to report on alignments for the subjects used in the source datasets and not on all possible subjects

What have we done? For each collection we: Extracted a SKOS vocabulary out of the subject terms found in the object metadata Set-up a permanent session on CultuurLink Asked providers to perform the alignments Collected and assessed the alignments and feedback obtained from the Data Providers

Concept definition obtained from the MMSH dataset <skos:conceptscheme rdf:about="http://www.europeanasounds.eu/data/mmsh/concepts#conceptscheme"> </skos:conceptscheme> <skos:concept rdf:id="grelot"> <skos:inscheme rdf:resource="#conceptscheme"/> <skos:preflabel>grelot</skos:preflabel> <skos:note rdf:resource="http://mintprojects.image.ntua.gr/data/sounds/http://phonotheque.mmsh.humanum.fr/dyn/portal/index.seam?page=alo&aloid=9800"/> Text found in dc:subject <skos:note rdf:resource="http://mintprojects.image.ntua.gr/data/sounds/http://phonotheque.mmsh.humanum.fr/dyn/portal/index.seam?page=alo&aloid=9775"/> skos:notes URIs of the records are kept as <skos:note rdf:resource="http://mintprojects.image.ntua.gr/data/sounds/http://phonotheque.mmsh.humanum.fr/dyn/portal/index.seam?page=alo&aloid=9801"/> <skos:note rdf:resource="http://mintprojects.image.ntua.gr/data/sounds/http://phonotheque.mmsh.humanum.fr/dyn/portal/index.seam?page=alo&aloid=9768"/> <skos:note rdf:resource="http://mintprojects.image.ntua.gr/data/sounds/http://phonotheque.mmsh.humanum.fr/dyn/portal/index.seam?page=alo&aloid=9798"/> <skos:note rdf:resource="http://mintprojects.image.ntua.gr/data/sounds/http://phonotheque.mmsh.humanum.fr/dyn/portal/index.seam?page=alo&aloid=9788"/> </skos:concept>

The alignments obtained from CultuurLink <rdf:rdf MIMO concept xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:owl="http://www.w3.org/2002/07/owl#" Subject term xmlns:skos="http://www.w3.org/2004/02/skos/core#" > <rdf:description rdf:about="http://www.europeanasounds.eu/data/concepts#guitare"> <skos:exactmatch rdf:resource="http://www.mimo-db.eu/instrumentskeywords/3237"/> <owl:differentfrom rdf:resource="http://www.mimo-db.eu/instrumentskeywords/5137"/> </rdf:description> <rdf:description Alignments identified rdf:about="http://www.europeanasounds.eu/data/concepts#flûte"> by the data <skos:exactmatch rdf:resource="http://www.mimo-db.eu/instrumentskeywords/3955"/> provider for this subject </rdf:description> <rdf:description rdf:about="http://www.europeanasounds.eu/data/concepts#grelot"> <skos:exactmatch rdf:resource="http://www.mimo-db.eu/instrumentskeywords/2873"/> </rdf:description> <rdf:description rdf:about="http://www.europeanasounds.eu/data/concepts#ban"> <owl:differentfrom rdf:resource="http://www.mimo-db.eu/instrumentskeywords/2498"/> </rdf:description> <rdf:description rdf:about="http://www.europeanasounds.eu/data/concepts#violon"> <skos:exactmatch rdf:resource="http://www.mimo-db.eu/instrumentskeywords/3573"/> </rdf:description> </rdf:rdf>

The quantitative results of the experiment

Findings identified when aligning with CultuurLink (1/2) Applying an exact string matching of preferred labels is sufficient to align ~50% Also incorrect alignments were identified due to polysemy reasons e.g. ban or zang which means singing or song matching the instrument zang, a sort of cymbals or clapper bells Applying match against labels in any language turned out be very successful on finding matches based on vernacular terms But also increased the number of irrelevant alignments

Findings identified when aligning with CultuurLink (2/2) More elaborate strategies were found very helpful to discover more alignments: by using a less restrictive string matching function like contains or startswith, to surface broader or narrower relations by activating stemming e.g. Trompet was aligned with Trompetten and Accordeon with Accordeons, both in Dutch by applying fuzzy matching both with max distance of 1 and 2 The NOT A functionality was found crucial to iteratively refine the strategy Using such strategies also revealed some quality issues in the source metadata, such as: misspellings and unrecognizable/doubtful terms

What about MIMO? In general the Data Providers found MIMO: Good coverage of musical instruments Good language coverage comparing to their local vocabulary Simplified hierarchy allowing it to be understandable and practical for non musicologists Includes updated families treating both electronic instruments and tools that are presented in contemporary music Helpful concept definitions However, Lacks concepts to describe voice (texture, mechanism, etc.) But may be enriched by the DOREMUS project with vocal terms from the IAML mediums of performance thesaurus Centred on occidental classical music structure

Quick demo http://cultuurlink.beeldengeluid.nl

Thank you!