CLARIN AAI Vision. Daan Broeder Max-Planck Institute for Psycholinguistics. DFN meeting June 7 th Berlin

Similar documents
CLARIN - NL. Language Resources and Technology Infrastructure for the Humanities in the Netherlands. Jan Odijk NO-CLARIN Meeting Oslo 18 June 2010

WP6- Analysis in the Visual Domain

ITU-T Y.4552/Y.2078 (02/2016) Application support models of the Internet of things

Europeana Core Service Platform

A portal for film archives in Europe - The European Film Gateway

PART A - Project summary

ITU-T Y Functional framework and capabilities of the Internet of things

LIBER Road Map towards Digitisation

SDDS Plus - Efficient reporting and coordination concept

DATA CITATION. what you need to know

IERC Standardization Challenges. Standards for an Internet of Things. 3 and 4 July 2014, ETSI HQ (Sophia Antipolis)

COMMUNICATIONS OUTLOOK 1999

Media and Data Converging Media and Content

Primo. Michael Cotta-Schønberg. To cite this version: HAL Id: hprints

Ex Libris and Shibboleth

COMMUNICATIONS OUTLOOK 1999

ENCYCLOPEDIA DATABASE

EUROPEAN COMMISSION Directorate-General for Communications Networks, Content and Technology

Aggregating Digital Resources for Musicology

Applying to carry BBC content and services: a partners guide to process

Name / Title of intervention. 1. Abstract

Approaches to E-Book Acquisition in Bavaria

IMS Brochure. Integrated Management System (IMS) of the ILF Group

OECD COMMUNICATIONS OUTLOOK 2001 Broadcasting Section

Switchover to Digital Broadcasting

Laurent Romary. To cite this version: HAL Id: hal

-Technical Specifications-

ELIGIBLE INTERMITTENT RESOURCES PROTOCOL

Workshop on repositories and journals

Susan K. Reilly LIBER The Hague, Netherlands

The European Film Gateway. September 2008 August Project presentation. Cofunded by the Community Programme econtentplus

FROM: CITY MANAGER DEPARTMENT: ADMINISTRATIVE SERVICES SUBJECT: COST ANALYSIS AND TIMING FOR INTERNET BROADCASTING OF COUNCIL MEETINGS

BAAC RIGA October 4 6, 2010

The digital revolution and the future of scientific publishing or Why ERSA's journal REGION is open access

Best Practice Regulatory Frameworks for Mobile TV. forum

ManusOnLine. the Italian proposal for manuscript cataloguing: new implementations and functionalities

Network and IT Infrastructure Services for the IoT Store

Dr. Tanja Rückert EVP Digital Assets and IoT, SAP SE. MSB Conference Oct 11, 2016 Frankfurt. International Electrotechnical Commission

2018 GUIDE Support for cinemas

RULES & REGULATIONS. B- SHORT FILM SECTION Films that run for a maximum of 45 minutes can compete for the following prizes:

A Gateway to Film Heritage in Europe Archimages09 18 November 2009 Paris

AN ELECTRONIC JOURNAL IMPACT STUDY: THE FACTORS THAT CHANGE WHEN AN ACADEMIC LIBRARY MIGRATES FROM PRINT 1

CALL FOR APPLICATIONS EUROVISION SONG CONTEST A DAL 2019 (THE SONG 2019)

Building Your DLP Strategy & Process. Whitepaper

Author Frequently Asked Questions

Defining National Solutions for Managing Book Collections and Improving Digital Access

Term Sheet Reflecting the Agreement of the ACCESS Committee Regarding In-Flight Entertainment November 21, 2016

Born Digital Project. of the California Digital Newspaper Collection

Internet of Things: Cross-cutting Integration Platforms Across Sectors

RDA RESOURCE DESCRIPTION AND ACCESS

COLLECTION DEVELOPMENT POLICY OF THE NATIONAL LIBRARY OF FINLAND

AGENDA. Mendeley Content. What are the advantages of Mendeley? How to use Mendeley? Mendeley Institutional Edition

Collection Development Policy J.N. Desmarais Library

MEDIA WITH A PURPOSE public service broadcasting in the digital age November 2002

Request for Proposals Fiber Optic Network Backbone Upgrades

Arrangements for: National Certificate in Music. at SCQF level 5. Group Award Code: GF8A 45. Validation date: June 2012

Introduction of digital TV in Bosnia and Herzegovina - Support for Public Broadcasting System

Metuchen Public Educational and Governmental (PEG) Television Station. Policies & Procedures

Vision and Implementation Plan for a National Clearing House for Print Disabled Canadians

TEN TRANSFERABLE LESSONS FROM THE UK S DIGITAL TV SWITCHOVER PROGRAMME

The Consortium of European Research Libraries: Accessing the Record of Europe s Book Heritage. Marian Lefferts, Executive Manager

one century of international standards

On Screen Marking of Scanned Paper Scripts

What Publishers Really Do for the Academic World

administration access control A security feature that determines who can edit the configuration settings for a given Transmitter.

The EU and film archives

Abstract. Justification. 6JSC/ALA/45 30 July 2015 page 1 of 26

Running head: HARRISON COLLGE 1

Collecting bits and pieces

A Gateway to Film Heritage in Europe

The Joint Transportation Research Program & Purdue Library Publishing Services

Development of Reference Management System in Cloud Computing Environment

The digital bookshelf. Vigdis Moe Skarstein, National Librarian, Norway

Research outputs: You want me to do what?!?

RoMEO Studies 8: Self-archiving when Yellow and Blue make Green: the logic behind the colour-coding used in the Copyright Knowledge Bank

A Gateway to Film Heritage in Europe BAAC & LCSA Annual Conference 5 October 2009 Vilnius

I 1 CASE STUDY. AccorHotels SAT. Kathrein Solutions for Hotels and Guest Houses

Policy on the syndication of BBC on-demand content

STANDARDISATION MANDATE TO THE CEN ON THE HARMONISATION OF

City of Grand Island

Cataloguing pop music recordings at the British Library. Ian Moore, Reference Specialist, Sound and Vision Reference Team, British Library

Before EFG: MIDAS. A Gateway to Film Heritage in Europe. Il Cinema Ritrovato Bologna 4 July 2009

Using Primo for searching Archives and Manuscripts: challenges and an approach. Richard Masters: IGeLU, Helsinki, 8 September 2009

Arrangements for: National Progression Award in. Music Performing (SCQF level 6) Group Award Code: G9L6 46. Validation date: November 2009

COLLECTION DEVELOPMENT POLICY

The Omnichannel Illusion. 80% of retailers lack an omnichannel strategy

Contract Cataloging: A Pilot Project for Outsourcing Slavic Books

IoT Architecture for Future Building Management Embedded Lighting Controls

Community Theater Journal, New York State; Collection ua

Catalogues and cataloguing standards

A Gateway to Film Heritage in Europe

Autodesk software rental plans

ENGINEERING COMMITTEE Energy Management Subcommittee SCTE STANDARD SCTE

CONTEMPORARY TENDENCES IN SERBIAN ACADEMIC LIBRARIANSHIP WITH SPECIAL EMPHASIS ON CATALOGUING AND CLASSIFYING LIBRARY MATERIALS

The CYCU Chang Ching Yu Memorial Library Resource Development Policy

2017 GUIDE. Support for theatres

Arrangements for: National Progression Award in. Music Business (SCQF level 6) Group Award Code: G9KN 46. Validation date: November 2009

Collection management policy

Committed to connecting the World ITU ACTIVITIES IN DIGITAL BROADCASTING TRANSITION. JO, GueJo

Making sense of it all - combining digitized analogue collections with e-legal deposit and harvested web sites

Transcription:

CLARIN AAI Vision Daan Broeder Max-Planck Institute for Psycholinguistics DFN meeting June 7 th Berlin

Contents What is the CLARIN Project What are Language Resources A Holy Grail CLARIN User Scenario AAI Vision and what needs to be solved to achieve it

What is CLARIN Common Language Resources and Technology Infrastructure The CLARIN project is a large-scale pan- European collaborative effort to coordinate and make language resources and technology available and readily useable for Language & SSH (Social Sciences & Humanities) researchers.

Language Resources Any resource used to study language Text Corpora Newspapers,, email, sms messages Multi-media corpora Audio recordings to study phonetics, train speech recognizers Video recordings for Sign-Language studies Language Documentation (language use in cultural context) Multi-Media Lexica Lexical entries linked with pictures, sound

Sign-Language Example

Multi-Media lexicon example Lexical entries link directly into archived corpora, e.g. via Annex

What is CLARIN CLARIN is an EU Infrastructure project with 4.2 ME funding for a 3 year preparatory phase (ends 2010) Additional funding from national governments (at this moment at least 14 ME ) The CLARIN consortium has now 36 partners from 26 EU countries The CLARIN community has >180 member organisations in 32 countries (mostly from NLP orgs.) CLARIN is based on many earlier initiatives with many participants: LangWeb, EARL, TELRI, LIRICS and more recent DAM-LR MPI for Psycholinguistics is responsible for WP2; working on the technical infrastructure

CLARIN Time Plan 2008-2010 Preparatory Phase Limited set of federated CLARIN centers (10+) Showcases, demonstrators WP8 Investigation national funding for the construction & maintenance phase 2011-2016 Construction Phase No direct European funding but EU assisting projects Depend on national project commitments Netherlands already until 2014 currently intensive preparations for CLARIN D (->2016) 2016? - Operational Phase Has to be cost efficient, we have to compete! CLARIN EU continuation after the preparatory phase is likely in the form of an ERIC important if only to provide a legal entity to make contracts with outside parties on behalf of the CLARIN community.

A backbone of CLARIN centers These together uphold the infrastructure, maintaining it and offer guidance & expertise for its use. Have stable repositories for resources and services Need strong national support for many years Need good teams that have a long time perspective and can provide persistency and continuation of knowledge This is yet far from reality Current situation is one of accidental and temporary collaborations and obligations Only a limited number of centers can probably fulfill the criteria of sufficient stability, funding and technological strength Currently 25 candidate centers

CLARIN Holy Grail User Scenario A researcher authenticates at his own organization and creates a virtual collection of resources from different repositories. He does this on the basis of browsing a catalogue, searching through metadata, or searching in resource content. To be granted access to this distributed dataset he signs the appropriate licenses He is then able to use a workflow specification tool and process this virtual collection using LT tools in the form of reliable distributed web services which he is authorized to use. (Intermediate) results are stored in a user specific workspace After evaluation, the resulting data (including metadata) can be added to a repository and the virtual collection specification can be stored for future reference For our domain this is ambitious and challenging, but even a partial realization is worthwhile

CLARIN Infrastructure Components In the previous scenario we find the following components & functionality Metadata catalog Virtual collection registries Persistent Identification of Resources EPIC: European PID Consortium: GWDG, CSC, SARA AAI infrastructure Technical issues Organizational Legal

Virtual Language Observatory CGN (12.000) OAI PMH harvesting IMDI Domain End.Lang. (35.000) MPI (33.000) and transformation GIS overlay BAS (7.400) AILLA (1.800) OLAC (40.000) Indexes Facetted Browser LRT Inventory (800/137) DFKI Tool Registry (292) ELDA (60) others hard problem: - mapping - granularity - curation Catalogue

CLARIN AAI Purpose is to create one single domain of CLARIN resources and services for our users Where users have only one identity (and since we hope to have very many users) preferably maintained at their home institute and can use SSO (single sign on) between the centers Our users are linguists and SSH academics spread out over Europe, CLARIN can not hope to influence the way their user accounts are set-up. But CLARIN can profit from existing AAI systems in the research & education domain. CLARIN centers are part of the CLARIN organization and they can be asked to conform to specific standards wrt. AAI

Federated Authentication Many countries have a National Identity Federation (IDF) set up by the different NRENs (national research education network) Such a federation is a collection of IdPs and SP Users have an account at their institute (IdP) and can use resources or services from centers (SPs) When a user accesses a resource at a SP he can authenticate at his own IdP 1 IdP 2 3 SPa 5 resources SPb user Purpose: info Provide SSO Single user identity Limited user information exposure 4,6 processing resources

CLARIN wide AAI (1) The CLARIN SPs become members of their national IDFs Rely on the edugain confederation (GEANT 3 project) to provide the trust between the national IdFs IDF a SP1 edugain is not yet functional attribute harmonization issues privacy issues disclosing attributes when crossing national frontiers SP2 edugain Metadata & trust IDF b homeless users? IDF c SP3

CLARIN wide AAI (2) Establish a CLARIN SP organization as a legal entity able to sign contracts where needed with the national IDFs CLARIN SP organization takes care of exchanging the SP specifications with the national IDFs IDF a SP1 Metadata & trust SP2 SP3 IDF b homeless users? IDF c

How about licenses? Many resources are available under a special license (EULA) e.g. Academic use only CLARIN WP7 investigated possible harmonization Should a user have to repeatedly sign the same EULA at different data provider when processing a distributed data set? This would break the SSO! Can we store the signed EULA information at the users IdP as an attribute? CLARIN has no way of influencing the IdP organizations so a CLARIN registry for this would be needed

Virtual Organization Platform user browser SPa SPb VO Platform External User Attribute Authority There is a PoC implementation available This is suitable as a basis for a CLARIN EULA service. Developing this further (probably) part of CLARIN NL IdP EULA DB Create special EULA service. This is part of the CLARIN organization independent of the IDFs

CLARIN SP Test Federation The national Identity Federations (IDF) will come together in a single confederation: edugain This way users associated with any IdP can use resources from any SP in the confederation This is not operational yet Therefore CLARIN created a SP federation that can sign contracts with the individual IDFs This is an administrative burden but: it works!, is extendible and independent of edugain progress Current status Initial Service Provider Federation: MPI-Psyl, BBAW, IDS, CSC Made contract with HAKA Finland, DFN AAI Germany, SURFfed Netherlands Successfully demonstrated SSO with a few SPs

Problems encountered Federation fees for SPs SURFfed, HAKA require payment from external SPs to enter the IDF. All foreign SPs could be considered external. Particular IDF requirements Specific X509 certificate issuer(s) (HAKA) IdP initiated SP connection request (SURFFed) Explaining the SP federation model to all participants SP, IDF management and legal people Scalability of the contracts Important flexibility to add new SPs or national identity federations without too much overhead. One representative for the SPs with power of attorney to deal with the national identity federation agreements (1xN instead of NxN signatures). Currently a CLARIN centre, in the future the CLARIN ERIC

National IDF policy What can national IdFs do to make (CLARIN) life easy. Facilitate/push edugain, that would solve most of our problems. Think of harmonizing your contracts (saves the number of annexes in the CLARIN SP contract) Be flexible, be aware of different situations for SPs from other countries e.g.the certificate issuer requirement Don t start asking money for connecting the CLARIN SP federation. We are not commercial publishers Keep cooperating with us, it is going well!

Non-EU collaborations Regional Archives Initiative: Cooperation of MPI-Psyl with other organizations interested in EL archiving They use MPI s LAT archiving software Encourage local resource collecting & archiving Network of South American archives has been established and contacts with CLARA were made

Non-EU collaborations data sync How will we accommodate users and SPs from non-eu countries? Will we have to wait for a super edugain or can we introduce non-eu IdPs & SPs in the CLARIN federation? Regional Archives Initiative: Cooperation of MPI-Psyl with other organizations interested in EL archiving They use MPI s LAT archiving software Encourage local resource collecting & archiving Network of South American archives has been established

collaborations/interactions concrete plans joint projects cooperations contribution PARADE discussions

Thank you for your attention CLARIN has received funding from the European Community's Seventh Framework Programme under grant agreement n 212230