Identifiers and GLIMIR 18 March 2009 OCLC Symposium for Publishers and Librarians Janifer Gatenby Research Integration and Standards OCLC
Identifiers Resource Identifiers Creator Identifiers Institution Identifiers
Importance of identifiers Identifiers Seal Uniqueness: n number of other elements are necessary for uniqueness Commerce: distribution, promotion, rights management, copyright protection, royalty payments On the web: key to navigation among sites for resources & information about resources
GLIMIR Glimir (Global Library Manifestation Identifier) is a project to connect all metadata records for the same resource or publication, starting initially with WorldCat.
Importance of RANK Ultimate Goal of GLIMIR: to link resources in different sites with a single agreed canonical identifier to cluster hits and thereby to maximize the rank of library resources in web sphere
Resource Identifiers: GLIMIR Global Library Manifestation Identifier (Manifestation = Resource) No one single manifestation identifier ISBN, ISSN, ISMN (music), ISRC (sound recordings), V-ISAN (audio-visual) Only 30% of WorldCat resources have an international identifier DOI
Outside use of OCLC numbers. A subsidiary of the US ISBN Agency
Linking inwards: OCLC Permalinks Simple URLs permitting direct access into WorldCat Want to use them (or equivalent) for accessing same resource on other databases www.worldcat.org/oclc/225507364
Linking inwards: WorldCat API Identifiers [SRU or OpenSearch] direct access to single metadata record thence to full text thence to enriched content Citations, holdings, OPAC links Potentially audience level, copyright http://worldcat.org/devnet/index.php/main_page
Linking outwards: To:
xisbn xissn xoclcnum Web services to find all related editions of a resource Easily incorporated into library catalogs, Web sites, and other library applications See http://worldcat.org/devnet for more info 100+ ISBNs for Sorcerers Stone 32 English (US and UK) 9 Spanish 3 Russian, German, Finnish, Latin 2 Chinese, Czech, French, Korean, Norwegian, Persian, Polish, Portuguese, Romanian, Turkish, Welsh, 1 Afrikaans, Albanian, Armenian, Basque, Bengali, Georgian, Galician, Gaelic, Ancient Greek, Greek, Gujarati, Hindi, Hungarian, Icelandic, Italian, Japanese, Latvian, Lithuanian, Malayalam, Sherpa, Slovenian, Swedish, Thai, Ukrainian, Urdu 16 Audio 59 Book
ISSN History Tool http://worldcat.org/xissn/titlehistory?issn=0888-5885
SRU record update : Near Real Time Machine & QA corrections and merges Identifiers, inserts and corrections Inserts, updates, deletions Near Real Time Update Dutch union catalogue (GGC) in 12 months 560,000 records, 2 million holdings Libraries Australia went live on 16 January 2009
Data identifier export service Provision of work level identifiers associated with a member s subset of WorldCat Permitting clustering of result sets (without any retrospective data conversion)
Family of Identifiers ISCI WCat Identities, VIAF FRBR clusters ISNI, ISIL ISTC, ISWC, ISAN Work Authors Expression Expression Subjects, Dewey + Manifestation Manifestation Manifestation Manifestation GLIMIR ISBN, ISSN, ISMN, V- ISAN
Identifiers and Required Groupings IR IR Lang Lang Phase 1 Class Subject Author Author
Linked content at the right level Reviews, evaluation, lists, prizes Work Biographies, affiliations Authors Expression Expression Subjects, Dewey + Manifestation Manifestation Manifestation Manifestation Full text links, usage statistics, cover art, holdings
Tidy identifiers All resources identified in a global scheme Possible to do cross database links more reliably; mashups Improved quality of WorldCat Statistical data is consolidated Important for copyright registry
Step 1 - OCLC WorldCat Quality DDR project Duplicate Detection and Resolution (Q1 2009) Will not eliminate intentional duplicates different language of cataloguing, different schemes of transliteration, institutional records Require mechanism to cluster these variants & be able to select the most appropriate for display depending on the user Impact on quality Manifestation count, in addition to record count Impact on all products and services Part of identifiers architecture
Further Steps Optimise OCLC products and services Identifiers at work and contributor level Display most appropriate record e.g. Collection analysis, WCRS, xisbn, xissn Towards a global identifier Diffusion of identifiers Increased coverage Manifestation resolution services
Weibel Lines Read more at: http://weibel- lines.typepad.com/weibelines/2008/02/a-glimir-of-the.html To the extent that such identifiers are canonical that is, become the dominant identifier for a given asset, they increase the URI equity for library assets and will strengthen the library presence on the Web. Interesting and challenging issues arise in the design of such identifiers and their supporting infrastructure. Broad adoption will require a careful balance of use-cases, business issues, and community participation in meeting the need. All of this in an environment already crowded with myriad special purpose identifiers.
Author identification OCLC & libraries potential core part of ISNI consortium BnF, BL, CISAC, ALCS, Adami, Bowker, IFRRO, Prolitterus ISNI cannot be managed in the same way as ISBN etc. Authors do not stay with one publisher cannot hand blocks to publishers Need database for management & ideal to start with an existing database ISNI proof of concept VIAF & WorldCat identities + ALCS, UK, Adami & CISAC
ISNI identifiers Seal uniqueness of an identity Permit the liaison of the same identity in different databases Significant redundancies can be eliminated by sharing data
VIAF (www.viaf.org ) Enhanced Records Records with links Total links LC 4,788,892 1,072,159 1,488,970 BnF 1,053,729 593,484 917,209 DNB 3,093,385 721,477 1,097,228 Sweden 116,599 51,606 122,439 Totals 9,053,605 2,438,726 3,625,846 And expanding: Czech republic, Israel, Italy, Japan, Portugal, Slovenia and Spain
VIAF Swedish
WorldCat Identities http://www.worldcat.org/identities/ Dependence on VIAF Sources: Authority records in WorldCat FRBR comparison algorithm (work level) WorldCat VIAF 126 million bibliographic records Each name in WorldCat Personal names: 24,669,126 Corporate names: 7,029,257 Subject names: 14,445 (e.g. animals, gods, imaginary characters) Wikipedia
WorldCat Identities VIAF LINKS
ISNI Proof of Concept For VIAF & libraries Enables significant honing of author / work links Enriched data (biographic) For ALCS & industry Matching techniques & expertise Freely available data (9 million VIAF, 31+ million WCat identities) direct search and SRU API Wiki interface for corrections and enhancements (trial) Biographical links (Wikipedia ++) Enriching links to holdings, translations, non commercial works (unreported royalties?)
ALCS data 311,046 line entries (author / work) Semi Colon delimited contactid; nameid; IPINameNumber; prefix; forename; middlename; surname; suffix; dateofbirth; dateofdeath; nationality; contribution; ISBN; title; subtitle 35,088 unique contact identifiers (parties), about 40,000 public identities 27,319 matches (78%)
Jack Curtis / David Harsent (& Oliver Dalton, Francis Greig, David Lawrence, David Pascoe) A bird's idea of flight ALCS OCLC After dark LC OCLC Another round at the pillars ALCS LC OCLC Confessor ALCS OCLC Conjure me ALCS OCLC Crows' parliament ALCS LC DNB OCLC Sam Lawrence Der blick des magiers DNB OCLC Der schrei der schwalbe ALCS DNB OCLC Die spur der krahe ALCS DNB OCLC Dreams of the dead ALCS LC OCLC From an inland sea ALCS OCLC Gawain libretto ALCS LC BNF OCLC Glory ALCS LC DNB OCLC Le parlement des corbeaux ALCS BNF OCLC Legion ALCS LC BNF OCLC Les enfants du matin ALCS BNF OCLC Livewire chillers down came a spider ALCS OCLC no author attribution
Jack Curtis / David Harsent Marriage ALCS BNF OCLC Mirrors kill ALCS LC OCLC Mort ou vif BNF Mr. punch ALCS LC OCLC News from the front LC OCLC Point of impact LC OCLC Potted priest OCLC Ricordati di me ALCS Ruchlos DNB OCLC Selected poems 1969 2005 ALCS LC OCLC Sons of the morning ALCS OCLC Sorrow of Sarajevo OCLC Sprinitng from the graveyard LC OCLC Storybook hero OCLC Terrahawks ALCS OCLC Tonight's lover OCLC Truce LC OCLC Violent entry ALCS LC OCLC
David Harsent
Jack Curtis
Jack Curtis BNF This record would match with the LC / DNB cluster if the ALCS data were in VIAF
Jack Curtis
David Lawrence Cold kill ALCS LC DNB OCLC Dead sit round in a ring ALCS LC DNB OCLC Der kreis der toten DNB OCLC Down into darkness ALCS LC OCLC Geruch des todes ALCS DNB OCLC Nothing like the night ALCS LC DNB OCLC Quatre morts assis en rond ALCS BNF Vier doden in een kring ALCS OCLC
David Lawrence
David Lawrence BNF
Metadata requirements Essential for matching process Contact, name and IPI identifiers Titles of works and their ISBNs Name: - prefix, forename, surname, suffix, birth and death dates It would be nice to keep the contact id in the database without the identity of the party (non displayable) Possibility of engaging authors in their own data maintenance is appealing (WIKI)
Institution Identifiers WorldCat Registry of Libraries and research is working on a publisher registry by mining from WorldCat. Currently 1750 publishers mapped to 8.5 million resources. Only another 120 million resources to go!!
NISO I2 Committee http://www.niso.org/workrooms/i2 Common identifier for all Institutions in the journal supply chain
Discussion Points Potential use via identifiers of: Existing WorldCat APIs Permalinks, XID, WorldCat API Work data service VIAF, WorldCat identities APIs Registries WorldCat Institutions, Publishers, Copyright evidence (BRR) Further cooperation / collaboration? Action items
Thank You! http://community.oclc.org/metalogue/