Catalogue & Index Periodical of the Chartered Institute of Library and Information Professionals (CILIP) Cataloguing & Indexing Group

Catalogue & Index Periodical of the Chartered Institute of Library and Information Professionals (CILIP) Cataloguing & Indexing Group October 2013, Issue 172 Editorial Is authority control something we do only because we have always done it? Is it symptomatic of control freaks obsessing over trivial details? Can we justify spending time on it when resources are short? The articles in this issue take a practical look at managing authority control work in the most effective ways and explain how it not only helps our users now but is also fundamental to linking data and developing new ways of searching and finding. We hope you enjoy reading them. 2-8 In-house authority control: a Cambridge perspective by Fiona Grant 9-13 Authority Control at LSE: the continuing story by Helen Williams 14-26 Authority Control and Changing Library Systems at the University of Oxford by Alasdair Mac- Donald and Nathalie Schulz Call for contributions 173, theme issue of RDA This will be published in December and will be about your experiences of implementing RDA. We would like to hear from anyone with a triumph to report, a struggle to share or a problem to air. We would like to hear from you whether you are a practical cataloguer, a cataloguing manager, a LMS supplier or bibliographic record vendor or anyone else affected by the adoption of RDA. Please contact the editors with your ideas: Helen Garner (h.j.garner@shu.ac.uk) or Heather Jardine (heatherjardine402@hotmail.com). Copy date will be 30 November 2013. 27-29 ORCID: A Research Support Perspective by Natalia Madjarevic 30-35 Transforming Authority Control: designing a scalable and sustainable approach for the University of Kent by Clair Waller, Josie Caplehorne, Robin Armstrong Viner and Tony Whitehurst 36-39 Outsourcing Authority Control at UCL by Thomas Meehan 40 Calling all cataloguers: the UK NACO funnel seeks your opinions and views by Deborah Lee

In-house authority control: a Cambridge perspective Fiona Grant, Authority Control and English Cataloguing Specialist Barbara Tillett rather succinctly summarises the purpose of authority control when she states that Authority control brings precision to searches. It enables navigation and provides the end user with explanations for variations and inconsistencies. The controlled forms of names, titles and subjects help collocate works in displays (1). Surely this is great news! Authority control ensures the user successfully finds exactly what he / she is looking for within a given bibliographic database. Mention of authority control in your library in the current climate, however, is unlikely to be met with a positive and welcome response. Authority control work has long been regarded as the most expensive aspect of cataloguing and is not universally applied. Some libraries do not use authority control in their catalogue at all and where it is used authority records have almost always been created to varying standards over time. Cambridge is no exception. Not all University of Cambridge library databases use authority control and though the NACO name authority file, consisting of some 8.3 million records (2), is now used within the main University Library database even this sits alongside a local authority file of approximately 35,000 records created before the library became part of a cooperative programme. Access points have, over time, emanated from various sources such as the former British Library Anglo-American Authority File, items in hand, publisher or vendor records and other library catalogues. They have, therefore, either never been subject to authority control or subject to authority control using a different authority file. Faced by these challenges some libraries have outsourced, or are considering outsourcing, their authority work. This involves sending bibliographic records for cleaning-up by a provider who automatically matches access points within those records to authorised access points, most usually within the LC / NACO authority file. The cleaned-up versions of these records are then returned to the library for reloading. As Helen Williams demonstrates in her article on retrospective authority work, however, outsourcing is not without problems. Expectations are often higher than what service providers are actually able to achieve. Many of the things the LSE library had hoped would be corrected through automated processing appeared in accompanying error reports (3). Furthermore, access points are only cleaned-up to authorised access points which exist at a given moment in time. Subsequent amendments have to be dealt with separately through a subscription service and presumably the further toing and froing of data. The adoption of RDA itself results, and will continue to result, in more authority work. In the short-term large numbers of changes to existing authority records are required to make them RDA compliant. In the longer-term the fact the rule of three no longer exists, and that RDA gives greater scope for recording relationships, particularly between named persons and a work, is already resulting in more work. Furthermore, as Alan Danskin points out in issue 158 of this publication (4), the need to accommodate the hybrid library or web will mean the scope of authority control will have to be extended to a much wider range of resources. What has RDA meant for authority control : the LC / NACO authority file context As noted by Hugh Taylor, as far as possible the NACO programme (Name Authority Cooperative Program) of the PCC (the Library of Congress s Program for Cooperative Cataloguing) has always had the aim of limiting changes to authorised forms of headings to those considered important (5). Consequently unnecessary work for any of the 785 NACO members (6) is avoided. As far as possible correct and unique authorised access points are left alone. Acceptable reasons for making changes number only three: conflict, author preference and record created in error. Thankfully, this philosophy prevailed when the summary of programmatic changes to the LC/NACO authority file, which prepared the LC/NACO authority file for use under RDA, was produced (7). Programmatic changes took place in two stages. The first made no changes to headings (1xx or 5xx fields) or cross-references (4xx fields) in any authority record. It simply identified records whose 1xx fields were, or were 2

Screenshot 1 likely to be, incompatible with RDA adding to them a 667 field stating This 1xx field cannot be used under RDA until this record has been reviewed and/or updated in capital letters (Screenshot 1). Only records to be dealt with directly by phase two did not have this field added. Phase Two made mechanical changes which included: Adding, revising, recoding or deleting 4xx references where appropriate Adding new RDA elements such as 046, 378, 382, 383 and 384 fields where possible and necessary Expanding abbreviations such as arr., acc. and unacc. Replacing ca. with approximately and fl. with active Modifying Bible headings - O.T and N.T. became Old Testament and New Testament or were removed from headings in the cases of individual books of the Bible Replacing violoncello with cello Expanding Dept. to Department Changing the heading Koran to Qur an Replacing Selections (which is no longer valid as a conventional collective title) with Works. Selections. A mere 5% of records within the LC / NACO authority file were affected. When we consider this is 5% of over 8 million, however, the figures are still huge Phase 2, which began at LC on the 7 March and completed on 27 March this year, changed 371,942 records. These were distributed, either daily or weekly depending on subscription type, to each of the NACO nodes (8) together with the regular LC/NACO transactions, which usually amount to approximately 10,000 a week. Following receipt they had to be loaded into the library management 3

system and bibliographic file maintenance began. The Library of Congress themselves acknowledge that considerable bibliographic file maintenance will be required. Processing changes to authority records: the Cambridge context In the workflow for the main Cambridge database, which contains some 4.5 million bibliographic records, all aspects of file maintenance are managed in-house, though external resources are extensively utilised. We receive weekly ftp files in MARC21 from the Library of Congress Cataloguing Distribution Service. These consist of a name authority file, containing records for personal, corporate, conference and geographic names together with those for uniform titles and series, and a subject authority file. The files include new, corrected and deleted authorities. Naturally during RDA phase 2 these files consisted primarily of huge numbers of authorities which had been amended to make them RDA compliant. As close as possible after receipt, these files are loaded into the Voyager database using software, generously written and provided free of charge to any library, by Gary Strawn of Northwestern University, who quickly recognised the inadequacies of Voyager s Global Headings Change facility. The programme designed by Gary for this purpose is called Authority Loader(9). It examines the content of each record and does three things: generates approved batch corrections, pending batch corrections and a series of numbered reports. In theory approved batch corrections can be carried out automatically, using Gary Strawn s Correction Receiver programme, without further review, whilst pending corrections require manual approval and can be viewed using Gary s Correction Reviewer programme. Both approved and pending batch corrections are further subdivided into those which will affect all bibliographic records at the heading to be changed and those which will affect only a selection or even a single record. Those only affecting one or a selection of records are accompanied by a text file giving the numbers of the bibliographic records which will be affected. They are easily identified by a tick on the Files, etc. (please check) tab within the Correction Reviewer software which the user can go to in order to see the title and bibliographic record number of affected records. (Screenshot 2) The series of reports produced by the programme are numbered and detect conditions which make a heading unsuitable for batch correction thus flagging attention to the need for further investigation by a manual operator. In practice, here at Cambridge, all headings are checked by a manual operator to one degree or another. 4

Screenshot 2 Approved batch corrections can be described as strong matches and include corrections such as the addition of death dates, re-arrangement of double-barrelled names, remove of titles such as Sir etc. However, this is not always the case. Concern arises when the new or amended heading provides less information that the original or when headings have text files associated with them (see above). In addition not all necessary changes are always picked up. For example, only the bibliographic record mentioned in the 670 of the authority record it is matched against is suggested as requiring change yet all or some other bibliographic records associated with that heading also require change. Very occasionally suggested changes are simply wrong. In other examples name/title changes sometimes include further subdivisions such as Selections or dates which do not apply to all bibliographic records associated with a heading. Pending batch corrections are weaker or more complex matches which require greater caution and human intervention is usually required. The success of keyword matching very much depends on the word being matched. If the word is letters, report or something equally vague the value of the match with a 670 field is greatly diminished if not useless. Because the theoretical lines between approved and pending corrections can be blurred in practice all corrections have a manual eye cast over them in Cambridge. 5

Investigative work can involve: carrying out both a staff name heading search and a staff subject heading search on Voyager; checking against the 670 fields in authority records to confirm or disprove a match; checking the British Library catalogue which is more up-to-date than our own, not least because they receive daily rather than weekly feeds of Library of Congress authority data and because they create such a vast numbers of authority proposals themselves; checking national library catalogues particularly for foreign language material, searching Google for specialist pages, university or other corporate institution pages or an individual s own page listing their works. It is often the case that in pending batch corrections where the old form is a common name without qualification, extra tidying up in relation to associated bibliographic records can be achieved. The Authority Loader programme can also generate up to 47 numbered reports each week though in practice the number regularly produced is thankfully much smaller. Some appear in connection with subject headings and others in connection with name headings. They deal with a whole manner of issues which have made the bibliographic records listed in them unsuitable for batch correction. The reports which I regularly spend time on are: Report 12 which lists headings that have bibliographic records associated with them which are not listed in the 670 fields. This doesn t necessarily mean they are wrong. For example, not all of Shakespeare s plays are listed in the 670s of the authority record for Shakespeare. Report 11 which details corrections deemed too weak or complex to be dealt with at batch level. Report 4 which details headings that have received updates to their 670 fields and includes authority records in which headings have become unique or non-unique as a result of a change. Report 9 which amongst other things details errors within headings and suggests possible duplication with other headings, though caution is required here as links can be tenuous. Report 18 which details headings deleted from the authority file which still have bibliographic records attached to them. These sometimes come with details of the reason for the deletion or the heading replacing the deleted one but this is not always the case. Deletions are usually the result of the invalid construction of the heading or the heading being deemed redundant because it is already covered by another heading or headings. Reports are viewed using Gary Strawn s programme Authority Loader Report Reader. In this programme the top half of the screen shows the condition detected by the programme whilst bibliographic records detected by the programme as being associated with the condition are listed in the lower right of the bottom half of the screen and possible existing headings to which these records could be attributed are listed in the lower half of the screen. The red cross button and split heading button, familiar to users of Gary Strawn s toolkit software, can be found down the middle of the lower half of the screen and can be used to force batch corrections if deemed desirable. (Screenshot 3) 6

Screenshot 3 Some corrections appear in multiple reports so careful attention and diligent work early on can save work later. Having a fairly good memory helps to identify headings which have already been dealt with in previous reports. Though in Cambridge we are lucky to have the resources to deal with authority control in-house, there is only one member of staff dealing with the work emanating from the Library of Congress weekly loads. Progress is good but there is recognition that there is a limit to what can be achieved. There simply isn t the time for some things. The chance of finding the correct headings for specialist collections, for example, is low, splitting an existing subject heading into two or more new headings can require looking at every bibliographic record or even physical items before a decision can be made and headings in non-roman script cannot generally be attempted without specialist knowledge. The introduction of RDA has undoubtedly affected the scale of the workload, not least because of the work resulting from programmatic changes. All phase 2 changes have have now been loaded and resulting approved and pending batch correction work completed. Work on the numbered reports will begin once the backlog of regular loads is cleared. Meanwhile work involving investigating the clashes between Library of Congress and local authority records has had to be, at least temporarily, suspended due to lack of resources. 7

The Cambridge way is not likely to suit every library. Gary Strawn s programmes, together with his selfless advice and assistance, have been invaluable to us and one wonders where we would be without him. His programmes are easy to install but configuration could be a problem for those who are not comfortable with what is going on behind the scenes in Voyager, in terms of the Oracle database etc. This is not a long-term answer but for the time being is providing the best solution available. The future of authority control Over the years the many UK libraries which use the LC/NACO authority file have benefited extensively, and more importantly, financially from the work that goes into the creation and maintenance of a common authority file. Thus, when we look at how expensive authority control is today we ought to take into account untold savings already made. That this authority file has adapted to the new RDA environment is of immeasurable importance. It not only ensures we are better able to differentiate between bibliographic identities but more than ever before allows us as noted by Tillett to envision using authority records to link to the authorized forms of name, titles and subjects beyond the catalog for which they were originally intended to various online-accessible reference tools and resources, like directories, biographical dictionaries, abstracting and indexing services, and so on (1). New fields in the authority record which separate out elements, such as: associated place; field of activity; affiliation and occupation, together with the recommendation from the Library of Congress that a controlled vocabulary is used to populate many of these fields, supports faceted discovery. The semantic web will, after all, be built on the accurate identification of persons, families, corporate bodies and the relationships between them expressed as authoritative linked metadata and libraries have much to offer in this respect. The main limitation is the MARC format itself which is less than adequately structured for machine manipulation. Let s hope that the Library of Congress s BIBFRAME project is able to come up with a solution that helps remove this barrier. References 1. TILLETT, Barbara (2004). Authority control and state of the art perspectives. Cataloguing and Classification Quartlery, 38 3/4, 23-41. http://polaris.gseis.ucla.edu/gleazer/461_readings/tillett_ac.pdf 2. Module 1 NACO training based on info from the authority file conversion project 3. WILLIAMS, Helen (2010). Retrospective authority control. Catalogue and Index. 158, 2-3. 4. DANSKIN, Alan (2010). Spelling it all out: FRAD, ISNI, RDA, VIAF: automation and the future of authority control. Catalogue and Index. 158, 17-20. 5. TAYLOR, Hugh (2010). Cooperative name authority data - the LC/ NACO Authority File. Catalogue and Index. 158, 10-1. 6. Module 1 NACO training 7. Summary of Programmatic Changes to the LC/NACO Authority file: What LC PCC RDA Cataloguers need to Know. http://www.loc.gov/aba/rda/pdf/lcnaf_rdaphase.pdf 8. PCC List Phase Two NACO changes completed (email Paul Frank 28/03/2013) 9. For the full suite of software written by Gary Strawn which can be used in conjunction with the Voyager database see: http://www.library.northwestern.edu/public 8

Authority Control at LSE: the continuing story Helen Williams, Assistant Librarian, Bibliographic Services, LSE Library Services In 2010 I wrote for both Update and C&I about the authority control project we carried out at the London School of Economics, concluding that, 'Although the project [had] been time-consuming, it [was] worthwhile. Our catalogue is now more consistent and has fewer errors, making retrieval more straightforward for users. In a library this size the catalogue is the primary way in which users identify our holdings. Our library catalogue continues to get a high score on the student satisfaction survey which shows that the hours put into this project have been fruitful.' 1 The importance of authority control is well recognised in the bibliographic community. As Michael Gorman says, 'We cannot have real library service without a bibliographic architecture, and we cannot have that bibliographic architecture without authority control' 2, so with that in mind our efforts on authority control work at LSE did not cease with the end of our initial project. With an edition of C&I dedicated to the topic of authority control it seemed timely to write about how we have incorporated the results of our initial data clean into our day to day workflows, and additionally how we are dealing with the 58,000 bibliographic records, and many more associated authority headings contained therein, imported into our existing library catalogue as a result of The Women's Library collections moving to LSE. 3 Back in 2007, at the point of our initial data clean, Marcive (the company to whom we outsourced our authority control project) kept a copy of our entire catalogue and all associated authority records, which enables them to provide us with 2 on-going services. On the second Monday of every month we export a file to Marcive containing all the new bibliographic records which have been added in the preceding month. Marcive run their automated processes on this file to correct any unauthorized headings, and to provide us with any authority records we do not already have for name, subject or series headings in those records. This is called the 'Overnight Authorities service, and our export is dealt with promptly by Marcive so that when we arrive at work on Tuesday morning the cleaned files of records have been returned to us. It is important for us that this process is so efficient because while the records are with Marcive we cannot make any edits to them or they would get overwritten when the cleaned records are loaded back into our catalogue. We have used a recurring meeting reminder, which all our cataloguers have in their Outlook calendars, to remind them when the export is occurring. As well as the overnight service for newly added records we wanted to ensure that authority file maintenance continued on the existing catalogue, and this is achieved through the Notification Service. This tells us whether any of the authority records already in our catalogue have been updated or deleted by the Library of Congress (LC). These authority records are loaded into Voyager (our Library Management System) by our IT team, and then using the Global Headings Change Queue (GHCQ) in Voyager we are able to manually approve each of these changed headings which in turn automates the updating of all affected bibliographic records. We choose to look at each heading manually not only so that we can be sure that each heading provided to overwrite one of our existing authority records is correct, but also so that we can see if there are any instances of this heading combined with subfields (Voyager cannot automatically update these so we deal with them manually or force them into the GHCQ ourselves) or any records where there is a conflict which will require manual resolution. 1 Williams, Helen, K. R., Cleaning up the Catalogue, in Library and Information Update, January/February 2010, p 46-48 2 Gorman, Michael, Authority Control in the Context of Bibliographic Control in the Electronic Environment', in Authority Control in Organizing and Accessing Information: Definition and International Experience, Eds., Arlene Taylor, and Barbara B Tillett, The Haworth Press, 2004, p.21 3 The Women's Library @ LSE http://www2.lse.ac.uk/library/newsandinformation/newsarchive/2012/womens- Library.aspx 9

It is not only the Notification service which generates in-house work. When the Overnight Authorities files are sent to us we also receive a report of unrecognized headings, which automated processing could not match with an authority record, and a multi-matches report, where the heading could conceivably be linked with more than one LC authority record, and automated processing could not resolve this. Both of these reports, as well as the GHCQ work are dealt with by one of our Senior Library Assistants as part of their regular work. Our authority control work hasn t been without challenges. We discovered, having established what we thought were successful procedures, that Voyager was not dealing properly with the delete reports that are sent to us by Marcive. The system automatically rejected any files containing a d for delete, assuming we would not want a deleted record loaded into Voyager, when it should instead have used those files to delete the corresponding authority record already stored in Voyager. This means we now have to manually go through the delete report dealing with records ourselves and updating affected bibliographic records. Earlier this year the extensive changes to the LC/NACO authority file as a result of RDA meant that we received an overwhelming number of new authority records into our GHCQ. The report was so large that for several days we were unable to open the GHCQ while we found a work-around by extending the 'time-out' function in our Voyager set-up files which allowed longer for the report to process and open before the system gave us error messages and crashed. Our most recent challenge has been the authority control related issues surrounding the merging of two separate catalogues. On 1 January 2013, custodianship of The Women s Library collection transferred from London Metropolitan University (LMU) to LSE, and throughout 2013 LSE Library staff have been working with LMU staff to ensure the successful move of the collections to LSE, ready for the opening of The Women s Library @ LSE in August 2013. This included the migration of 58,000 catalogue records for The Women s Library print collections into Voyager by 1 September 2013. This has been an extensive project and much of the work is outside the scope of this article, but authority control has been just one of the areas we have worked on as part of this data migration. LSE uses only LC authority headings in bibliographic records, but The Women s Library collection has a varied history and consequently different vocabularies have been applied to the records over time. Knowing that the 58,000 records would sit in Voyager, alongside our existing data, we felt that it was important to be using consistent authority headings across all records because bringing in variant or uncontrolled data would affect the search and retrieval functions and the quality of the catalogue as a whole. We decided that the most efficient way to achieve this unity would be to outsource an authority control data clean of the incoming records to our existing provider, Marcive. Importantly this also meant that the newly received Women s Library records would join the copy of our entire catalogue held by Marcive, enabling the authority file maintenance I described above to apply to The Women s Library records as well as to existing LSE records. Without sending those 58,000 records to Marcive it would not be possible to extend authority file maintenance coverage across the entire newly combined catalogue, which would have serious consequences for on-going authority control procedures for the catalogue as a whole. The cataloguing staff from LMU helpfully provided us with a raft of information relating to the cataloguing of The Women s Library materials. Particularly useful in the area of authority control was information about subject fields which, although they were no longer in use for current cataloguing, had been used to create records in the past. This meant a number of the records we received contained the following fields: 690 Precis descriptor string (Precis headings were formerly assigned to records created by the British Library between 1971 and 1990 695 Local subject terms which were not part of any named thesaurus 696 COMPASS topical descriptors and 10

697 COMPASS geographical descriptors (COMPASS stands for COMPuter Aided Subject System and replaced Precis in 1991 at the British Library. It was used until 1996, and was no longer used in British National Bibliography records from 1997) In order to make the subject data provided in these fields more consistent we decided to swap each 690, 695, 696 and 697 entry for an appropriate LC authority heading. The 41,471 Precis headings in 690 fields could be tackled with the help of Marcive. They offer a facility to flip 690 headings to 65X fields before they begin processing, and then match the Precis string to LC headings where possible. This was very successful, and only 4590 headings could not be changed using automated processing (which we will deal with manually in-house). It was not an option to have the 695, 696 and 697 fields dealt with in the same way, so instead we have internally generated reports of the 2104 headings falling into those fields so that we can manually amend the relevant bibliographic records, replacing the data in those fields with appropriate Library of Congress Subject Headings (LCSH). Another reason for the variety of different subject headings in the catalogue is that The Women s Library has followed standard British Library (BL) practice, which has changed over time. The British Library applied LCSH to records created for the British National Biography (BNB) between 1971-1987 and from 1995 onwards. 4 Similarly not all records created at The Women s Library have LCSH, so there were some uncontrolled 65X subject headings in the catalogue with a second indicator 4 (indicating the source of the subject thesaurus is unspecified) rather than 0 (indicating a LC heading). However a spot check indicated that a number of these headings were LC compliant (or could easily become LC compliant through automated processing). Something that we learnt in our last data clean was that Marcive does not attempt to clean subject data with a second indicator of 4 so this time around we were prepared for that and able to ask our IT team to run a global change on all The Women s Library records to change second indicator 4 to 0 on 6XX fields. This meant all those headings would be checked by Marcive, and matched with a correct authority heading where possible, and any which could not be cleaned through automation would be detailed for us in reports from Marcive which we could manually correct ourselves. As well as the use of different subject thesauri over time, we were aware that not all names in 1XX fields were LC authority controlled. BL name authorities (standard before 1993 when the BL and LC agreed to establish a single source of name authorities by merging their 2 files 5 ) had been consistently applied earlier in the history of The Women s Library collection, and so there was some variation between these and the LC headings that we have been familiar with using here at LSE in recent years. Additionally the specialist material being collected meant that authority records may well not have existed for vast numbers of names at the point of record creation, but many years later as we work on it now, it is more likely that LC name authority records will have been created for some of the names in the collection. We contacted Marcive to request a one-off automated check of all name, subject and series title headings against standard LC authority reference files and correction of unauthorised headings in catalogue records, and loading of these corrected name authority records into our local name authority file. We supplied a test file of 1000 records which Marcive returned to us, having cleaned it according to our specifications. As with our last authority control project, it was important to check this carefully and make sure we were happy with all the changes which would soon be applied to all the records being dealt with in this project. With this in mind 4 of the Bibliographic Services team were involved in checking 1 in 20 records from the sample file. Having sent the sample data using a MarcEdit file we realised that we faced an issue we hadn t dealt with as part of our previous project. Last time we had exported records directly from Voyager, which meant that 4 British Library, Metadata Services Standards: subject access in British Library bibliographic records http://www.bl.uk/bibliographic/ subject.html 5 British Library, Metadata Services Standards: authority control http://www.bl.uk/bibliographic/authority.html 11

the file of cleaned data supplied by Marcive was loaded back into Voyager using the bibliographic record identification number (bib id) as a stable match point. The Marcive reports of outstanding corrections contain just the bib id and the affected heading, so we rely on that Voyager bib id to retrieve the records which need attention in the subsequent in-house work on those reports. However at this stage of the project The Women s Library records did not have a stable Voyager bib id. We were either accessing the data via a MarcEdit file (where the records had an LMU bib id) or on our Development Server (where the records had a temporary Voyager bib id which would not be carried across to the live system when the data was migrated). The data was in our Development Server while we experimented with various other issues, such as automatically generating holdings and items records out of bib record data or linking parent and child records, and the initial plan was to load the data into the live server only shortly before the opening date for The Women s Library @ LSE so that records were not visible to the public before the material was available for use at LSE. However if we sent the data to Marcive from the MarcEdit file it would have no LSE bib id at all, and if we sent it from the Development Server it would not have the same bib id as it would once it was in our live system, meaning the bib ids in Marcive reports would not correspond with records we wanted to edit on Voyager. After much discussion of different options our IT team loaded all the records to the live system, and then carried out a bulk suppression of The Women s Library records, having tested on the Development Server that we would be able to carry out a bulk unsuppression later in the project. The full file was then sent to Marcive for the data clean and towards the end of July we received 58,000 cleaned records, associated authority records and 11 files of reports. These consisted of unauthorized headings for corporate names, meeting names, personal names, series entries, subject headings, geographic headings and then a large number of headings which had multiple potential matches with LC authority records. We carried out final checks on the data before IT loaded it into our live system. It is a busy time of year in the library with our Summer School students here, which meant it was not appropriate to take the system offline which would allow a quick load. However loading such a large amount of data onto the live system can slow it down, which again was something we wanted to avoid for our Summer School students. Instead the file was split into 4 smaller loads, and planned to be loaded overnight over a period of 4 days. We discovered during our initial authority control project in 2007 that loading the files and regenerating the keyword indexes at the same time practically ground the loading process to a halt, so the plan this time was to load the data without regenerating the indexes, with an awareness that this meant a short period in which the search facility used old indexes while bibliographic records contained new data. This would only affect staff working on the records as they were still suppressed from public view at this point. The data was then re-indexed by Ex Libris, our system vendor, as part of our Voyager upgrade in August 2013. While the data was loading into the system we began examining the files of reports which Marcive had supplied with the data in order to decide on the priorities for this manual part of the process, and preparing instructions for the temporary member of staff who would be carrying out this work. Aware that accessing the material through subject searches was of high importance we decided to deal with 65X fields first. These reports were also more manageable in size than the names-based reports. The unrecognised geographic headings report contained only 45 headings, and could be dealt with in a few hours, so this was a quick win. This was followed by the unrecognised subject headings report, which we are currently working on at the time of writing, and with 4590 headings we estimate will take 24 days to complete. In order to finish the subject related work we then plan to deal with outstanding 695, 696 and 697 headings before returning to the Marcive reports and beginning on the personal names report containing 9810 records. We have been able to employ a temporary cataloguer to work with us until the end of October, so we will continue to make as much progress through the subjects and personal names reports as we can, while keeping the unrecognised corporate names, meeting names, series names and multiple matches for future project work. 12

As I write (in August 2013) this project is still a work in progress. At the end of August all the records will be unsuppressed on our system ready for The Women s Library @ LSE catalogue to go live on 2 nd September. Although there will still be some on-going manual work to do around authority control at this point, as outlined above, our work with Marcive means we are confident that the vast majority of authority records will have been cleaned through automated processing and that future records added to The Women s Library @ LSE will be authority controlled through our regular and robust authority procedures. 13

Authority Control and Changing Library Systems at the University of Oxford Alasdair MacDonald, Head of Bibliographic Maintenance & Authority Control, and Nathalie Schulz, Systems Analyst, Bodleian Libraries of the University of Oxford Introduction In the mid 1990s the decision was taken to purchase the full Library of Congress Authority Files and subscribe to the weekly update files in conjunction with the implementation of the GEAC Advance library management system. The files comprise the Library of Congress Name Authorities (LCNA) and Library of Congress Subject Headings (LCSH). This was felt to be appropriate for a large union catalogue covering academic research collections and material acquired through Legal Deposit, to provide an efficient means of authority control including automated updates to the bibliographic catalogue. The Advance authority loader was enhanced with enriched reports and custom programming, with Gary Strawn (Authorities Librarian, Northwestern University) assisting Frank Watson of GEAC with the design specifications. The Bodleian Libraries is a member of the NACO (names) and SACO (subjects) authority record programmes overseen by the Programme for Co-Operative Cataloguing, contributing 0.8% and 1.6% respectively to the total output of these programmes in 2012. 1 The LC Authority File comprises ca. 9m records, with names making up 96% of the total. Of these, 2.32 million map to the 7.86 million bibliographic records in Oxford s union catalogue. Authority record management in GEAC Advance Authority-controlled bibliographic fields in the GEAC Advance system were underpinned by a database of authority records with different views: MAIN*AUTMAST, which mapped to the AACR2 records in the Oxford Libraries Information System (OLIS) database and MAIN*AUTMAST2 which covered authority-controlled fields in the non-aacr2 Pre-1920 catalogue. All LC authority records were contained in MAIN*AUTMAST and could only be edited by cataloguers with maintenance-level permissions. Any authority-controlled fields in the bibliographic catalogue which did not link to LC authority records generated a local authority record. These records could be edited to include 4XX fields and behaved in the same way as LC authority records. They were typically generated by unauthorized names, name/title headings and subject strings with free-floating subdivisions. Linking to LC authority records required a match on tag, indicators, subfields and exact character string. As such, even a missed diacritic or incorrect punctuation in an authoritycontrolled field of a bibliographic record would generate a local authority record. 1 Source: PCC statistics, 2012: http://www.loc.gov/aba/pcc/stats/fy2012/totals12.pdf 14

All authority records in MAIN*AUTMAST2 were local records with an 040 $f marker to identify them as Pre- 1920 headings. Personal name forms in MAIN*AUTMAST2 often matched the NACO forms, but were not controlled through the NACO records, appearing in the online catalogue as parallel entries. Managing authority loads and reports Every week LCNA and LCSH files are made available by the Library of Congress. Files were stored locally and loaded sequentially into GEAC Advance. In addition authority records could be loaded to MAIN*AUTMAST individually if required via the Z39.50 portal. Selected reports generated by the authority loads are discussed below 2 : Undifferentiated headings (Report 51 and Report F) Undifferentiated personal name NACO records were used for cases of multiple authors with the same name but no further information available to qualify them under AACR2 rules. Differentiation status is recorded in the 008 field in 008/32, with a recording a differentiated record and b recording an undifferentiated form. Report 51 listed any incoming records where the code in 008/32 had changed compared to the copy held in MAIN*AUTMAST. In addition, a manual keyword search (Report F) of the update file looked for new authority records that had budded off from undifferentiated forms. The search looked for keywords including formerly and undif* to pick up notes in the 667 fields that identified the new records as previously being covered by undifferentiated forms. 2 See Appendix 1 for a full list of GEAC Advance authority loading reports 15

There is no way of second-guessing the forms that may be created when authors bud off from undifferentiated records. Unqualified forms of names, as most undifferentiated records are, become points where multiple authors cluster in a large union catalogue. These two reports provided a useful stream of bibliographic re-alignment work and NACO submissions. The practice of creating undifferentiated forms was discontinued under RDA, with more flexible rules for using subfield $c of the 100 field to add qualifiers; but realignment work from this stream will continue as more authors are given their own NACO record. Error file reports (Reports 9, 10, 11, 12 and 17) Incoming records which were found to be in conflict with those already in MAIN*AUTMAST were quarantined in an error file, from which they could be assessed and loaded individually. These authority records represented NACO errors or highlighted inconsistencies in MAIN*AUTMAST. Conflicts triggering quarantine were matches of headings on incoming records with those already in the database (1XX=1XX, 1XX=4XX, 4XX=1XX and 5XX=4XX). Report 17 listed records where the 010 $a, containing the Library of Congress Control Number (LCCN), matched an existing 010 $z in the database. These were occasionally LCSH/NACO errors, but more frequently identified outdated forms of authorities present on MAIN*AUTMAST. 16

Database updates and near matches (Reports 4, 6, 7, 50 and Heading Changes Report) Updates to MAIN*AUTMAST and the bibliographic catalogue occurred in two ways. An incoming authority with a correction in the 1XX field would automatically update all associated bibliographic records, but would not update local authority records with free-floating subdivisions for subjects or author/title headings, which were documented in the Heading Changes Report. The local authority forms would then be updated manually, with Marc Edit introduced in 2009 to manage larger jobs such as prolific authors or frequently used subjects In addition, any local authority record where the 1XX field matched exactly the 1XX or 4XX of an incoming LC authority would be combined with the LC record and all associated bibliographic records updated where necessary, with the matches and updated bibliographic records documented in Report 6. Many such matches were correct, but Report 6 also provided a steady stream of bibliographic re-alignment and 17

follow-on NACO work. A common situation was a local personal name authority with no qualifier covering several authors being overwritten by an incoming LC authority, from which maintenance staff could look for the correct NACO forms or create new ones as appropriate. Due to the requirement for exact matching with local authority records, Report 50 looked for matches on normalised character strings following the $a subfield delimiter of local 1XX forms in both databases with the 1XX and 4XX fields of all incoming LC authorities. This comprehensive report meant that all cases where a match was missed due to a typo or incorrect tag or indicator would be picked up, but had the drawback of reporting a high level of false positives, which often made the report unmanageable. Changes to remove matches with MAIN*AUTMAST2 and matches on incoming 4XX fields helped reduce the number of false positives, but the report still took several hours to read each week. 18

Deleted authority records (Reports 14 and 15) Report 14 covered deleted authorities that mapped to the bibliographic catalogue, with Report 15 covering deleted authorities where the LCCN could not be located in MAIN*AUTMAST. Authority records would not be deleted by Advance if they were used by bibliographic records. It was necessary for staff to re-assign the bibliographic records to other authorized headings and manually delete the authority record. Moving to Aleph Ex Libris Aleph was selected as the new integrated library system in 2009, and work on configuring the system began in 2010, with go-live in July 2011. Authority loading re-commenced in January 2012. The change to the new system offered an opportunity to review the authority load reports and focus staff time on the most important work. Although time spent on managing the index error file reports and Heading Changes, which had increased in size during various automated update projects undertaken by LC (see fig. 7), had shown a decrease, there was still an opportunity to re-allocate staff time to work more relevant to the bibliographic catalogue. Prior to Aleph go-live the focus was on changes to the out-of-the-box configuration for authority control. Many of the changes were due to the need for two separate authority libraries (sections of the database). 19

One authority library (AUT10) holds LCNA and LCSH authority records. The second library (AUT11) stores 38787 authority records with cross-references from the Bodleian Pre-1920 catalogue. A separate authority library is needed to ensure that Pre-1920 authority records do not appear when staff check the authority file while cataloguing. As well as the two separate authority libraries it was necessary to define which bibliographic headings would link to which authority library, and the criteria for matching of bibliographic headings with authority headings. The normalization done on headings means that Aleph is more forgiving in terms of matching to the authority file. For example, the bibliographic heading Dipaolo, Michael, 1956- will match with the authority heading DiPaolo, Michael, 1956-. To avoid conflicts between 4XX forms that match 1XX forms, 4XX forms that match with 1XX forms have a qualifier added to them as part of the authority load. The Advance system recognised that a name/title authority (e.g., 100$t) could link to headings in bibliographic records that came from either a 100/240 field combination, or X00$t. A change to the 100$t authority record would be reflected in 100/240 and X00$t. Although it was possible to configure Aleph so that a bibliographic heading originating from 100/240 will link to the authority heading, the 100/240 fields do not change when the authority changes. These changes need to be done manually and are identified in one of the authority load reports. The Library of Congress Name Authority file contains a number of 151 authorities for jurisdictions which are intended to control X101 bibliographic headings (without subdivisions). In Aleph a link has been created between the authority 151 and bibliographic X101, but update from the authority record has been prevented, as otherwise the fields in the bibliographic record to would change to X51. Authority loading process Authority loading is only done in the AUT10 library as the Pre-1920 authority library is static. Although Aleph offers a stand-alone service for loading authority records this is not used as it only provides a limited number of reports. Each of the separate processes involved in the authority load (fixing the records, matching with existing records on LC Control number, adding and updating of records) is scripted plus various processes necessary for reports. The names and subjects files are loaded separately and produce separate sets of reports. Authority reports Moving from Advance to Aleph meant a conceptual shift from a database structure where the bibliographic records stored a pointer to authority records rather than the actual heading, to one where bibliographic headings are stored in those records and linked to headings in authority records. Many of the Advance reports were based on tracking what had happened with local authority records. Because Aleph does not work this way it was necessary to find other ways to identify where review and manual work are needed. The reports detailed below are created for both names and subjects unless otherwise specified. 3 Undifferentiated headings (Reports A and E) The two reports produced to assist with the work on undifferentiated names are very similar to their 3 See Appendix 2 for a full list of Ex Libris Aleph authority loading reports 20

Advance counterparts. The first contains the text of name authority records which contain one of the following strings of text: "Formerly", "undiff", or 'unique". The report is produced by running a query on a text version of the LCNA file.. The second report provides details of authority records where the 008 character position 32 (Undifferentiated personal name) has changed. During the authority loading process, after records have been matched, an extract is taken of the 008/32 value for all matched records. After load, the extract is done again and the two files compared. A report is created containing the 008/32 value before and after the load, the full authority record, and the titles of up to five linked bibliographic records. New authorities with linked bibliographic headings (Report D) After the load of new authority records, the Oracle tables which hold details of bibliographic headings are queried to see if any links are being made to the new authority records. This identifies cases where an authorized form has appeared in the bibliographic catalogue before the authority record is loaded, or where there was a bibliographic heading that matched a see-reference form on the new authority record. The report gives details of the heading, followed by a list of up to five titles (with year of publication) and then a copy of the full authority record. A refinement added to the report after staff had been working with it for a few months was to exclude from the names report cases where there is only one bibliographic record linked and the title matches the content of the 670 field in the authority record. Existing authorities with additional linked bibliographic records (Report F) During the load process, before matched records are loaded, their authority system numbers are isolated and a query run on the Oracle table that holds details of bibliographic headings and their link to the authority file. This results in a list of identifiers of the bibliographic headings that have a link to the authority record. These identifiers are used to query another Oracle table to find out which bibliographic records are associated with the bibliographic headings. The combined results of the two queries are stored in a file. After the authority load the Oracle tables are queried again and the results stored in a separate file. The two files are compared to see if there are any differences, which are included in the report. The format of the report is the same as that for new authorities with linked bibliographic headings. The report picks up situations where a new see-reference in an authority record matches a bibliographic heading and also where the 1XX authority heading has changed and there are already bibliographic records that use this form. 21

Changes to authority headings (Reports G, K, L) In Aleph, when an authority heading changes, the previous form is added to a "COR" field in the record (Aleph allows for tags that are non-numeric). It would be possible to report on all COR fields after each authority load but there would be many that had no links to bibliographic headings and are therefore of no interest to the authority control team. In order to identify just those of interest it is necessary to search the log of the process that makes links between authority headings and bibliographic headings to find instances of the COR field. A truncated version of the log entry is put into a report (sorted into alphabetical order rather than order of processing). Aleph will change records with subdivisions when the main heading changes, so there is much less staff work involved compared to similar reports in Advance. Changes to 151 headings are listed at the bottom of the report; these need special staff attention as they are loaded with a flag to prevent update of the linked bibliographic records (for the reason described above). Towards the end of the authority loading process (once all related bibliographic indexing is complete) an Aleph service is run to delete COR fields. The fields are not deleted if they are used by a bibliographic heading. The output of the standard Aleph process is a list of authority record identifiers and whether or not the COR field has been deleted. This information is used to create a report that contains the full text of the authority record. This report is important for identifying cases where the authority record links to a 100/240 string in the bibliographic record. The COR field needs to be deleted manually and any bibliographic records changed. 22

Changed subject authority records with subdivisions also need staff review and are included in a separate report. At the start of the load process for subjects, all authority records with subdivisions ($v, $x, $y, $z) are extracted and placed into a separate file. These authorities are loaded with a flag to prevent update of the bibliographic headings. This is because a change to the subdivision or the addition of a subdivision can lead to corruption of the bibliographic headings. A report is produced containing the text of the authority record so that staff can handle manually those that have a COR field. Deleted authorities (Report I) Unlike Advance, Aleph will allow deletion of an authority record even when there are linked bibliographic headings. Deleted authority records are separated out at the beginning of the loading script and matched against existing records. For matching records the relevant Oracle table for bibliographic records is queried to see if any bibliographic headings link to these authority records. If they do, details of the authority record and the bibliographic titles are put into a report. Staff use the report to assign the bibliographic records to the correct authority records. Deleted authority records that do not match with the authority file are not loaded. Other reports As well as the reports above, statistical reports are also produced of the number of new, matched, and multi-match records for both new and corrected authorities and for deleted authorities. These are obtained by 23

counting the instances of different values in LDR position 5 in the files that are output from the matching process. Multi-match reports are created as part of the standard Aleph matching process, and contain records where the 010$a in the incoming record matches with two or more 010$a fields already in AUT10. These records need to be processed manually by deleting one of the existing authority records and replacing the remaining authority record with one obtained from the Library of Congress using z39.50. In the case of multi-match reports for deleted records the duplicate records need to be deleted manually. The multi-match reports are usually empty. Some reports produced from the Advance load are no longer required. Report 17 identified cases where the 010$a matched a 010$z. As part of the migration to Aleph authority records containing only 010$z were removed from the authority file, and the latest version of the authority records (obtained from the Library of Congress using z39.50) loaded. Conclusion The change of integrated library system from Geac Advance to Ex Libris Aleph provided an opportunity to re-examine the manual work done following the load of Library of Congress Authority Files. The two systems work in very different ways and the customised load reports in each case reflect this. Many of the Aleph reports (such as the new D, F and I) are more effective in providing the information that staff need to maintain the local versions of the LCNA and LCSH files. 24

Appendix 1. GEAC Advance in house authority control report Incoming new authority records with no match on MAIN*AUTMAST Report 1 - Incoming new authority records with no match on MAIN*AUTMAST Report 2 All incoming authority records written to the error file (see reports 5, 9, 10, 11, 12 and 17) Report 3 Incoming corrected authority records matching an existing record on LCCN Report 4 Incoming new authority records matching 1XX=1XX with a local authority record Report 5 Incoming new or corrected authority records matching 5XX=4XX on an existing authority record Report 6 Incoming authority records matching 4XX=1XX with a local authority record Report 7 Incoming corrected authority records matching 1XX=1XX with a local authority record Report 8 Incoming corrected authority records with no match on LCCN in MAIN*AUTMAST Report 9 Incoming corrected authorities with match more than one existing LC authority record or LCCN on 1XX Report 10 Incoming new or corrected authority records matching an existing record on 1XX=1XX or 1XX=4XX Report 11 Incoming corrected authority records matching an existing record on 1XX=4XX Report 12 - Incoming corrected authority records matching an existing record on 4XX=1XX Report 13 Incoming deleted authority records matching an existing record on LCCN Report 14 Incoming deleted authority records where bibliographic records are linked to the record matching on LCCN in MAIN*AUTMAST, including 5XX references Report 15 Incoming deleted authority records with no match on LCCN in MAIN*AUTMAST Report 17 Incoming new and corrected authority records where the 010 $a matches and existing 010 $z in MAIN*AUTMAST Report 18 Incoming new and corrected authority records not applicable for subject use (008/11=n) Report 50 Report of all near matches of incoming records on existing LC and local authority records Report 51 Incoming corrected authority records with a change in 008/32 Report F Manually generated report to identify personal name records formerly covered by undifferentiated forms Heading Changes Report Incoming new and corrected authority records with changed headings where local authority record exist for heading plus subdivision See Also Tracing Changes Report Incoming corrected authority records with changes to the 1XX forms which appear on MAIN*AUTMAST as 5XX forms in other records 25

Appendix 2 - ExLibris Aleph in house authority loading reports Report A Authority records that contain Formerly and/or undiff and/or unique (LCNA only) Report B Matching statistics for new/corrected records Report C Multi-match new/corrected records Report D New authority records which have bibliographic records linked Report E Change in 008/32 (Undifferentiated personal name) (LCNA only) Report F Increase in the number of bibliographic records linked to an authority heading Report G Authorities with a COR field used in bibliographic records Report H Matching statistics for deleted records Report I Bibliographic records that have headings which match deleted authorities Report J Multi-match deleted records Report K COR fields that cannot be deleted Report L Updated subject authorities with subdivisions 26

ORCID: A Research Support Perspective Natalia Madjarevic, Research Support Services Manager, LSE Library Name authorities and research support Name authority files have been used in libraries to enhance information retrieval for decades (see Younger, 1995) via library catalogues and, more recently, institutional repositories. Extending this expertise to support the uptake of name authority systems in the wider context of scholarly publications is an opportunity for academic libraries in enhancing support offered to researchers. Part of my role in research support at LSE Library is overseeing our repository services, including our institutional repository LSE Research Online, and bibliometrics initiatives. Both activities rely in varying degrees on effective name authorities and connecting researchers to their research activities and outputs. Effective name authority identifiers ensure researchers particularly those with popular names are more likely to be cited accurately, attributed correctly in indexing databases and have the ability to pull this data together for analysis. This is vital if researchers and institutions are to be able to collect and analyse indicators of research impact in order to demonstrate research quality, impact on society or to develop performance indicators or collaborations with other institutions. A registry of unique persistent identifiers for researchers Officially launched in October 2012, ORCID (Open Researcher Contributor ID) is a community-driven registry of unique persistent identifiers for individual researchers that can be used across various scholarly publishing activities. An ORCID id provides a unique ID for an individual in the same way a DOI provides a unique ID for a journal article. For example, a researcher registers for their ORCID id which can then be used during grant applications, submissions to publishers and when depositing papers in a repository. This aims to bring an individual s research activities and outputs together under one identifier to be used throughout their research career. Publications are added to a researcher s ORCID record, either manually or via harvesting from databases such as CrossRef (see Mayer, 2012), listing outputs that can then be exported via an API or integrated with other services. ORCID also enables linking to other name authority IDs such as ResearcherID and Scopus Author Identifier. ORCID statistics and membership model At the time of the writing, ORCID has issued over 228,930 identifiers, recording 1,691,577 works, 974,627 of which have unique DOIs. ORCID also has 71 members including publishers (Nature, Springer, Wiley Blackwell), institutions (Glasgow University, Boston University, CERN, Harvard), funders (Wellcome Trust, NIH) and repository systems (CrossRef, AVEDAS, Altmetric, Symplectic). The scope of the ORCID membership demonstrates the potential uptake across the scholarly publishing process and the way ORCID ids could be integrated and used by researchers from initial manuscript submission, with publishers and during dissemination via deposit in an institutional repository. Integrating ORCID I attended the ORCID Outreach Meeting at the University of Cambridge in May 2013, and the most useful part of the event was hearing how ORCID can be used in an institutional setting, for example by integrating with repositories or research information systems. 27

Potential ORCID integrations are: Link with institutional repositories Harvesting to university research information systems (CRIS) Manuscript submission Grant applications Link to other name identifiers (such as ISNIs) This demonstrates how ORCID ids could be used across the lifecycle of a research paper. If integrated, it is also important that publishers and institutional repositories display ORCID ids in their metadata in a consistent way, as discussed by Haak in this post. The Boston University presentation at the Outreach Meeting provided an insight into the work involved in integrating ORCID ids at an institutional level. Boston explained that internal negotiations and discussions with legal teams, Deans, Council and relevant stakeholders took around six weeks - and were by far the most time-consuming part of the project. Boston decided on a process of opt-out ORCID ids for faculty, to initially be included in university profiles, and an opt-in option for students and postdocs. Although ORCID is a self-claim system, member institutions are able to create ORCID ids on behalf of current employees. If a researcher has already independently set up an ORCID id, this can be merged with the record created by the institution. Boston were at a relatively early stage in the roll-out process but plan to use ORCID ids in the HR system, institutional repository and during thesis submission. I m particularly interested in how the ORCID registry could be used in libraries to enhance institutional repositories and current research information systems. At LSE, we have a name authority file for all authors in our institutional repository and pay close attention via unique identifiers to ensure academics are described in a consistent and accurate way. International name authority registries, such as ORCID, could therefore enhance our current processes and enhance the metadata held in the repository. Name authority is so integral to ensuring academics get maximum credit for their work and libraries are central services well-placed to support developments in this area. ORCID and altmetrics ORCID ids can also be used to access altmetrics services, which track reaction to research papers on social platforms such as Twitter, Facebook, Mendeley, citeulike, Delicious and Figshare. Researchers can enter their ORCID id into a tool such as Impact Story, which then displays social activity associated with their publications. This is a practical example of how ORCID ids can be used to access tools which demonstrate research impact in a non-traditional way and, by authoritatively linking researchers to their publications, help expand the view of an individual s research activity. Conclusion The ORCID registry seeks to address the issue of author disambiguation by connecting researchers to their publications. There are clearly opportunities for libraries to support ORCID, either via membership and integration with current systems or by training researchers to claim their ORCID id and to use it as much as possible throughout the publication process. References Antman, Karen (2013). Launching the BU ORCID Initiative: Ensuring Credit for Your Work http://ctsi.bu.edu/index.php/launching-the-bu-orcid-initiative-ensuring-credit-for-your-work/. Retrieved 30/08/2013. 28

Kelly, Brian (2013). Why every researcher should sign up for their ORCID ID. http://blogs.lse.ac.uk/ impactofsocialsciences/2013/01/30/why-every-researcher-should-sign-up-for-their-orcid-id/. Retrieved 30/08/2013. Haak, Laure (2013). Suggested Practices for Collection and Display of ORCID ids in Publishing Workflows. http://orcid.org/blog/2013/08/08/suggested-practices-collection-and-display-orcid-ids-publishing-workflows. Retrieved 30/08/2013. Meyer, Carol Anne (2012). News Release: ORCID and CrossRef Collaborate to Accurately Attribute Authorship of Scholarly Content: http://www.crossref.org/01company/pr/news111412.html. Retrieved 30/08/2013. Priem, Jason, Taraborelli, Dario, Groth, Paul and Neylon, Cameron. (2010). Altmetrics: a manifesto. http:// altmetrics.org/manifesto/ Retrieved 30/08/2013. Shema, Hadas (2013). Thoughts about altmetrics (an unorganized, overdue post): http:// blogs.scientificamerican.com/information-culture/2013/08/03/thoughts-about-altmetrics-an-unorganizedoverdue-post/ Retrieved 30/08/2013. Younger, Jennifer A (1995). "After Cutter: Authority Control in the Twenty-First Century". Library Resources & Technical Services (0024-2527), 39 (2), p. 133. 29

Transforming Authority Control: designing a scalable and sustainable approach for the University of Kent Clair Waller, Josie Caplehorne, Robin Armstrong Viner and Tony Whitehurst Introduction In January 2012 University of Kent Information Services (IS) launched a shelf ready strategy. This strategy sought to implement the recommendations of the Universities UK Efficiency & Modernisation Task Group (2011, pp.6-7). That report noted that: And that: Shared services are often held up as an off the shelf solution for efficiency, but if their potential is to be fully realised in higher education then simplifying, streamlining and improving internal processes needs to be a priority. There is significant potential for outsourcing and the development of strategic relationships with the private sector to deliver services. The strategy sought to simplify, streamline and improve the IS internal processes, and develop strategic relationships with the private sector, for purchasing and providing access to library materials. The aim was to: Improve service Making newly purchased items available to library users as soon as possible Ensure sustainability Providing a flexible model that makes newly purchased items available within the same timescales regardless of the number of orders submitted or items delivered Enhance data quality Ensuring that bibliographic data supports: enrichment with cover images, previews and tables of contents; searching and suggestions within the catalogue Deliver cost and efficiency savings Allowing IS staff time used to catalogue, classify and process newly purchased items to focus on priority areas to improve the student experience and support research A project Develop Quality Control for Catalogue Data was established to deliver the third of these three aims. This article sets out the approach that IS has taken to that project and how this will deliver a scalable and sustainable approach to authority control. Developing quality control for cataloguing Background Prior to the implementation of the shelf ready strategy authority control was undertaken by the IS Metadata & Processing team as part of the cataloguing workflow. Although some headings in bibliographic records matched those in the Library of Congress Authorities this matching had not been done in a consistent way and in most cases no authority record had been created. Of the 10,732 authority records that were created many contained only the authorized form of the name. Authority records were not created for series or subjects. There was no programme, and no time available, to retrospectively create authority headings for names, series or subjects. IS needed a solution that would address these issues if the quality of the library catalogue records was to be addressed. 30

The introduction of the shelf ready process for library materials would also multiply the challenges around the limited quality of the library catalogue records and number of authority records. An aim of that project was that no member of the IS Metadata & Processing team would need to check or edit the library catalogue records supplied with those materials in order for them to be made available to library users as quickly as possible. The solution identified needed to allow authority control to be undertaken remotely and retrospectively to avoid any delay to the receipting of new materials. Aims The aims of the project are to: 1. Present, both internally and externally, consistent and high quality bibliographic records that reflect well on the University of Kent and give library user confidence in the ability of IS to deliver catalogue content that compares well with competitor institutions and offers the best student experience in terms of resource discovery 2. Maintain a high quality data that would make it far easier to integrate the Library Management System (LMS) with other information systems such as discovery layers. This would not only reduce time, effort, and money in the long-term, but would also mean that IS could continue to provide innovative and effective solutions to library users 3. Produce a framework of data quality and control for bibliographic records using in-house expertise and external outsourcing 4. Ensure quality control could be undertaken at scale wherever possible resulting in an achievable workload for the IS Metadata & Processing team Approach taken The business case for the project acknowledged that quality issues with existing data could be resolved retrospectively and a framework of data managed by the IS Metadata & Processing team. However it was clear that there wasn t sufficient resource within the team to manage either of these tasks. As with the creation of bibliographic records it was recognised that outsourcing some of these functions to an external company would provide a more efficient and cost effective service. The Request for Quote process The University Procurement Office advised that a Request for Quote (RFQ) process should be used to select a supplier. The RFQ itself set out the criteria for the evaluation of quotes as: Ability to meet requirements Price Ability to carry out the work in the proposed timescale References from other institutions Quality control procedures The requirements were for a supplier to: 1. Check the authority data against the Library of Congress authorities and highlight all unauthorised headings 2. Match the unauthorised headings to Library of Congress authorities, upgrade the existing headings in the bibliographic record and return them for import 31

3. Upgrade the existing authority records to full authority records; 4. Provide additional authority records 5. Use authorised RDA headings where these existed and AACR2 heading where no RDA heading was available 6. Create a list of unauthorised headings that cannot be matched, to enable manual checking and upgrading in-house as required 7. Identify authority headings incorrectly encoded in MARC and to amend these incorrect tags Suppliers were also encouraged to give additional quotes for ongoing authority file maintenance. The project manager had identified three potential suppliers who were all notified of the RFQ and who all submitted quotes. Although the RFQ was open to other suppliers no further quotes were received. The project team evaluated the quotes against the criteria after the RFQ closed and the contract was awarded to Backstage Library Works (BSLW). RDAification Although the RFQ specified that: The upgrading of our bibliographic data (other than the amendment of the 100, 110, 111, 600, 610, 611, 650, 651, 655, 700, 710, 711 fields to reflect authority file changes) is not included in this project. Two of the suppliers who responded to the RFQ offered the option to RDAify the bibliographic data at no additional cost. Following the award of the contract to BSLW the project team reviewed the implications of this additional work and agreed that it should take place. Prior to the export of the existing bibliographic data work the project team completed an online profile. The profile included a short explanation of each prospective improvement and links to a wiki which the project team used to further research potential benefits and other possible implications. Concerns were raised that historical cataloguing errors would precipitate unsatisfactory results of RDA enrichment. However the project team concluded that the benefits of this process outweighed the potential inaccuracies that might occur. The completed profile specified that BSLW should: 1. Clean up the 010, 020, 022 and 034 fields 2. Update the LDR, 006, 007, 008 fields 3. Add a subfield d to the 040 field to identify those records edited by BSLW 4. Update or delete obsolete fields 5. Update or delete obsolete subfields 6. Update relator codes in subfield e of the 100 and 700 fields The project team also asked BSLW to: Correct errors in the 1st and 2nd indicators for all fields Standardise the form of the General Material Designation (GMD) in subfield h of the 245 field Improve the consistency of bibliographic records by adding a GMD in subfield h of the 245 field where one did not already exist (where the existing coding of the data supported this) 32

The specification for authority data included: Library of Congress genre headings, in addition to name and subject headings RDA headings in preference to AACR2 headings in preparation for the move RDA Files of near-matched, partially-matched and non-matched headings to future analysis Checking the data Once the project team had completed the profile BSLW enriched a sample of 5,000 the University s bibliographic records. The project team carried out a detailed analysis of the test file supplied by BSLW which highlighted some inaccuracies that required further investigation. It transpired that these problems were the consequence of historical cataloguing errors which would be picked up by post enrichment reports and addressed by the project team. As the project team were satisfied that there were no unexplained error in the sample, BSLW were given the go-head to enrich the entire catalogue. BSLW then presented the project team with text files containing 671,101 RDAified bibliographic records as well as the new and improved authority records. The project team divided into two teams of two to ensure that both sets of records were appropriately checked, each team taking responsibility for analysing one text file. Bibliographic records Given that it would be impossible to check each record the project team agreed that they would look at a random sample of the RDAified bibliographic records. However the project team were careful to ensure that the sample included examples of all item types included in the catalogue. Examination of the bibliographic records began with assessment of the supplied GMD, for example ensuring that [electronic resource] had been added where it had not previously been recorded in the subfield e of the 245 field. Equally significant were the checks that the publication data previously recorded in the 260 field had been correctly transposed to the 264 field; that the expansion of abbreviations, such as ill to illustrations and p to pages had been performed appropriately; and that the content, media and carrier categories had been properly recorded in the 336, 337 and 338 fields. Checking by the project team revealed that some 020 fields had not been updated and the 040 field had not been added as requested while some abbreviations had not been expanded and Latin words had not been replaced by English. The project team highlighted these to BSLW who corrected these errors and forwarded an updated file for checking. Headings and Authority Records The project team felt that although the checking of the headings and authority records would be necessarily limited acquiring full authority records would be of benefit as the existing authority file did not contain any full authority records. The approach taken was to check: A sample of the BSLW authority records against the Library of Congress authority records A sample of the headings BSLW had added to the University s bibliographic records to confirm the correct form of the heading had been used That the headings in the bibliographic records corresponded to the relevant authority records That the 001 fields in the BSLW text files matched those in the catalogue and were unique ensuring that the match point for importing the enhanced records had been maintained 33

Having investigated a selection of the bibliographic and authority records that project team were satisfied that: The BSLW authority records corresponded to those used by the Library of Congress and were full and accurate Headings added to bibliographic records were generally accurate, the author of the work had been identified correctly and the appropriated heading had been used Headings that could not be matched due to insufficient data in the existing bibliographic records appeared in the appropriate near-matched, partially-matched or non-matched files Next steps Loading the data Having established that there are no major errors in the BSLW files the project team is planning the upload of the enriched bibliographic records and the enhanced and new authority records. This will see the existing records overwritten by the enriched ones using the 001 as a match point. Although loading bibliographic and authority records will have a smaller impact on the performance of the LMS than loading bibliographic and holding records this still requires careful planning. A full import plan has been prepared which highlights the potential risks, how they have been mitigated and the recovery plan should the load fail. This includes the identification of a major incident co-ordinator who will act as a single point of contact should circulation be affected. Maintaining the Quality Once the enriched and new records have been upload the focus of the project team will switch to maintaining the quality of the catalogue. The BSLW response to the RFQ included a quote for ongoing services. These will see BSLW: Performing checks on, and cleaning up, new bibliographic records Ensuring that new headings match those used by Library of Congress Providing authority records for new headings Updating existing authority records to reflect heading changes implemented by the Library of Congress However these only form one part of the quality control procedures the project team aim to establish. A framework of reports has been created which identifies records affected other issues resulting from the shelf ready process and historic cataloguing practice. The impact of these issues will be assessed against the effort required to correct them (and resolve those highlighted in the near-matched, partially-matched or non-matched heading files) and presented to the IS Library Management Group as a prioritised work plan for the IS Metadata & Processing team. Conclusions Although the project has yet to close and there are a number of workpackages to be delivered the early indications are that it will realise its aims and deliver more than had originally been anticipated. The project team were fortunate in being able to learn from others, particularly Helen Williams at the London School of Economics, who were generous in sharing their experiences of outsourcing authority control. Despite the fact that no other libraries were in a position to share their RDAification of bibliographic records when the project began the project team believe that this has broadly been successful. Areas where the aims of the project have not been realised can be attributed to historic cataloguing practice which has made machine matching impossible. Brief records, merged print and e-book records 34

and abbreviated headings have all had a significant impact. Funding constraints also required the project team to check large numbers of records in a relatively short period, which had to be viewed in MARCedit as there was no test server available at the time. However the overwhelming view of the project team is that positives outweigh the negatives. A task which could never have been resourced in-house is nearing completion while carefully planning and monitoring meant that the checking process was uncomplicated and fluid. Most importantly, once the data has been uploaded, the accuracy and quality of the catalogue will have been improved. These changes will deliver a hybrid catalogue and add value for library users through the provision of additional data and a better user experience. References Universities UK Efficiency & Modernisation Task Group (2011). Efficiency and Effectiveness in Higher Education: a Report by the Universities UK Efficiency and Modernisation Task Group. [Online] London: Universities UK. Available from www.universitiesuk.ac.uk/highereducation/pages/ EfficiencyinHigherEducation.aspx [Accessed 19th August 2013]. 35

Outsourcing Authority Control at UCL Thomas Meehan, Head of Current Cataloguing, University College London Introduction Cataloguers at UCL (University College London) Library Services[1] perform the work of identifying authors and subjects relating to bibliographic records, but do not create authority records and generally do not edit them. This has been outsourced since 2006 to an external supplier, Library Technologies, Inc. (LTI)[2] based in the U.S. This article provides an overview of how this came about. UCL Library is a large academic library on a number of sites, mostly based in central London. We have been using the Aleph library management software since 1999. Background Before 2006, UCL used no authority control at all. That is to say, we did not have any authority records; we simply had the author and subject browse indexes in Aleph. We had begun to make more serious efforts to align our headings on LC forms, by checking new headings in particular against the Library of Congress Authorities website[3]. Although some cataloguers had experience of editing and using authorities in other work places, there was no established body of expertise at UCL, especially in how they worked on Aleph. As part of a wider report into the practices of major university libraries in the U.S. in 2002[4], a colleague investigated manual vs outsourced authority control and recommended either loading the complete LC Authority Files into Aleph or outsourcing the work. A further internal report in 2004[5] compared in-house and outsourced solutions for authority control. It found that the costs of performing the work in-house were prohibitive in terms of staff time and money; it recommended outsourcing the work to a vendor and signing up to a vendor s update service to maintain the authority file. Choosing a Vendor We investigated authority vendors and undertook to investigate two suppliers. We submitted the same large test file of bibliographic records to both vendors and analysed how well they performed on a number of criteria. Both vendors performed very well and could easily have been chosen in the absence of the other. Both matched headings very well, often where UCL s entry of the name required some interpretation of incorrect coding, punctuation, and - in the case of the Königlich[e] Akademie der Wissenschaften zu Berlin and the Polyglott [sic] Bible - incorrect spelling. Some headings were easily identified by one and not the other, and vice versa. One supplier was especially good at matching the names of film actors whereas the other excelled at names of kings, popes, and corporate bodies: we have a sizeable number of both. LTI - the successful vendor - were particularly effective at tidying up many of the small authority errors that had crept into our catalogue over time. For instance, they managed the complex rearrangement of subject strings where UCL had retained the wrong order of subdivisions or used headings with places which may not be subdivided geographically, e.g. converting Art, Modern$y21st century$zspain$zvalencia$vexhibitions to Art, Spanish$zSpain$zValencia$y21st century$vexhibitions. They were able to intelligently add or change qualifying information to conference headings, for example changing $clondon to $clondon, England and $cmanchester, Eng to $cmanchester, England. LTI were also able to deal with changing jurisdictions, something that was particularly significant with UCL s important collections at the School of Slavonic and East European Studies (SSEES). For example, a heading 36

which UCL had as World War, 1939-1945$zYugoslavia$zSerbia was converted to the soon to be out of date but correct World War, 1939-1945$zSerbia and Montenegro$zSerbia. LTI made similar correct changes with respect to Czechoslovakia and the Russian Federation. Similar changes were made with obsolete language names, such as changing Icelandic, Modern to Icelandic in a complex Biblical uniform title. In addition, the introduction of authority control presented an opportunity to tidy up other aspects of our data, such as punctuation and MARC21 indicators. Both vendors under consideration offered this service to some degree, either using specifications provided by us or using their own routines. LTI offered the latter and performed a number of very useful corrections with little need on our part to specify what was required. Examples included removal of some incorrect square brackets from uniform titles, even if no match with an authority record was made, rearrangement of conference subfields and accurate re-punctuation, and the correction of first indicators in name headings. LTI also correct the second indicator of 245 fields according to their language and the first word of subfield $a. Initial Implementation Before proceeding with LTI, we had to set up Aleph properly too. Aleph is fully capable of dealing with authority control but needed some configuration as we had not used it for authorities before. Generally this was a case of setting up separate databases for Library of Congress and Medical Subject Headings (MeSH) authority records - some libraries keep them together - and making sure that these were indexed and linked properly to the bibliographic indexes so that users and cataloguers benefit from the see references and additional information available. An Ex Libris consultant aided the systems team to perform much of this work, and helped to set up the export and import routines that would be needed to submit and retrieve any files. We submitted the entire file of over a million bibliographic records to LTI via ftp in late 2006. They processed the file, amending headings to match the LC Authority file, their own additional records, or MeSH as appropriate. We retrieved this file of amended bibliographic records as well as the authority records that went with them, and a number of reports giving statistics and details of linked and unlinked headings. The bibliographic and authority records were imported and indexed on Aleph. The whole process took about six weeks. While this was happening, we could not amend any of the records we had sent as all changes would be overwritten when the new versions were returned. This was obviously very disruptive although we could download and add new records. The initial run was generally very successful. There were some ongoing configuration problems in some of the obscurer corners of authority control, which took a lot of work to iron out. For example, duplicate see references, such as the non-preferred ABC which can refer to both the Australian Broadcasting Corporation or the American Bibliographical Center; the differentiation in display of LC and MeSH terms where terms match - such as Cancer being a see references in MeSH but a preferred term in LC; or, the clashing of subfields v and t in Library of Congress Subject Headings (LCSH) for terms like Correspondence. Ongoing Authority Work It was obviously important that we maintain our authority file. We still do not create any authority records ourselves. We check our now enhanced indexes where authority records are now visible. If we do not have an authority record for a heading, we check the LC Authorities or LCSH through Classification Web[6] or the LC Authorities site, and MeSH through the online MeSH browser[7] We do not need to download or 37

create any authority records at this stage as this is taken care of by ongoing updates by LTI. We signed up for LTI s Authority Express (AEX)[8] service from the beginning. For this, we send off all new and amended bibliographic records once a month. The process and results are much the same as for the initial authority work - we send off the bibliographic records which are returned to us amended and with matching authority records and reports - but the turnaround time is a matter of a few hours. In practice, this means we stop amending existing records at four o clock one afternoon and resume normal cataloguing the following morning. The AEX service does not deal with headings or authority records that are amended, so changes in headings over time, such as the addition of death dates to authors, or changing terminology in LCSH are not catered for. LTI s Authority Update Processing (AUP)[9] service takes care of this. We have not yet subscribed to this service but hope to do so soon. We have manually taken care of some pressing problems where, for instance, the addition of a death date means that an old name heading without it becomes out of step with a newly acquired name-authority heading with it. However, this is not scalable in the long run. There are in addition two bulk updates in particular that we would like to take advantage of, and that LTI should be able to help us with. The first of these is non-roman script see references, which were recently added to many authority records. We have significant Hebrew and Cyrillic script collections, and we have made efforts to enhance the original script elements of some of these records, especially for Hebrew script. These see references would greatly help the discovery and streamline the cataloguing of non-roman scripts. The other bulk update is of course RDA, which changes many authority headings in various ways, such as William, of Ockham, ca. 1285-ca. 1349 to William, of Ockham, approximately 1285- approximately 1349 as well as adding much information to the record itself, such as gender (MARC21 375 field) and field of endeavour (MARC21 372 field). Clearly it will not be possible to edit these manually. LTI should be able to perform the latter as part of a special AUP run[10]. Conclusion Outsourcing authority work was for us the only realistic way to implement authority control at UCL. There was a lot of work in setting up, testing, and choosing a vendor, but our systems team were excellent and the resulting day-to-day workflow is much the same as before but more rigorous. Cataloguer identification of current headings and corrections to our indexes is still vital, if not more so, especially on new records created from scratch, but we have only had to venture briefly into editing authority records themselves. The greatest obstacle in some respects is the large amount of down-time needed to process and index a large file of bibliographic records like ours. Six weeks is a long time for a cataloguing backlog to build up and to be unable to push material through for users to consult, although there are of course creative ways round this. However, once done, it should not have to be repeated. LTI themselves have been hard to fault: capable, reliable, and helpful. Outsourcing authority control has suited our situation and enabled us to implement it effectively from scratch very quickly. Although there is some loss of freedom, especially as we decided to only use authority records provided by LTI, relying only on outsourced records means we know that we are closely in step with orthodox LC and MeSH headings. This keeps the file easier to maintain and will be a boon when the correct standard forms of names and subjects effectively form the basis of links in a world of shared and linked data. 38

References [1] UCL Library Services. http://www.ucl.ac.uk/library/ [2] Library Technologies, Inc. http://www.authoritycontrol.com/ [3] Library of Congress. Library of Congress Authorities. http://authorities.loc.gov/ [4] UCL Library Services. A report into comparative cataloguing practices in Ivy League universities in the United States of America and UCL. [Internal report], 2002. [5] UCL Library Services. Improving authority control at UCL: looking at in-house and outsourced solutions. [Internal report], 2004. [6] Library of Congress. Classification Web. http://classificationweb.net/ [7] National Library of Medicine. MeSH browser. http://www.nlm.nih.gov/mesh/mbrowser.html [8] Library Technologies, Inc. Authority Express. http://www.authoritycontrol.com/ax-doc [9] Library Technologies, Inc. Authority Update Processing. http://www.authoritycontrol.com/aupdetails [10] Library Technologies, Inc. Toward implementation. http://www.authoritycontrol.com/rdatransition 39

Calling all cataloguers: the UK NACO funnel seeks your opinions and views Deborah Lee, Senior cataloguer, Courtauld Institute of Art Working collaboratively is a key part of cataloguing practices in the 21 st century. The UK NACO funnel will enable collaboration amongst the UK cataloguing community, by helping cataloguers to share their knowledge and building a community of name authority record experts in the UK. NACO, the Name Authority Cooperative Program, is a way in which cataloguers from around the world can contribute to the LC/NACO authority file. A NA- CO funnel is a group of libraries who work together to train in authority work and submit new authorities to this global database. NACO funnels are based around a common feature, for example, geographic location. Numerous states in the USA, Canada, South Africa, Mexico, Peru, and many more places are already reaping the benefits of having a NACO funnel; the aim of the project is to add the UK to this illustrious list. The UK NACO funnel project is in the planning stages, and a detailed workflow, models and other useful information have been created which outline how the funnel would work. Funnel coordinators from a wide variety of existing funnels have been consulted, and their experiences and data have been used to help develop the model for the UK NACO funnel. In addition, various useful discussions about the direction of the funnel have taken place over the last year or so, both formally as part of the CIG committee s activities and more informally as part of activities to disseminate information about the project, for instance at the CIG 2012 conference. Formal applications for support and funding will be submitted over the winter months. For a more detailed account of the proposed funnel, see Lee (2012). So, what do we need from you? Well, arguably the most important stakeholder in the funnel is you, the UK cataloguing community, and we really want to know what you think. While the feedback we have received at activities such as the CIG 2012 conference have been overwhelmingly positive, we are keen to engage with as many of the UK cataloguing community as possible to get a clear sense of the type and scale of interest in this project. This feedback will be used to steer the future direction of the funnel and if appropriate, to aid us in our quest for funding. A survey has been set up on SurveyMonkey to garner the reviews and opinions of the UK cataloguing community. The survey will be open until Friday 11 th October and we hope it will take no more than ten minutes to fill it in. The survey can be found here: http://www.surveymonkey.com/s/ct9v9hs. There is space at the end of the survey to provide your email address if you wish to be informed about the progress of the funnel, or alternatively you are welcome to email Deborah.lee@courtauld.ac.uk. Participating in this survey by sharing your ideas and thoughts will help this collaborative cataloguing opportunity achieve lift-off. Reference: Lee, D., 2012. Collaborative authorities: introducing the UK NACO funnel project. Catalogue and Index, 169, pp. 30-36 40

Catalogue & Index is electronically published by the Cataloguing and Indexing Group of the Chartered Institute of Library and Information Professionals (CILIP) (Charity No. 313014) Subscription rates: free to members of CIG; GBP 15.00 for non-members & institutions Advertising rates: GBP 70.00 full-page; GBP 40.00 half-page. Prices quoted without VAT. Submissions: In the first instance, please contact the Editor: For book reviews, please contact the Book Reviews Editor: Neil T. Nicholson, Cataloguing & Metadata Services Team Leader, National Library of Scotland, e: n.nicholson@nls.uk ISSN 0008-7629 CIG website: http://www.cilip.org.uk/specialinterestgroups/bysubject/cataloguingindexing CIG blog: New blog coming soon Tags from the CIG blog on specific areas of interest: authority control book reviews Catalogue and Index cataloguing CIG activities CIGS classification committees and working groups conferences Dewey digitised material Dublin Core events folksonomies linkblog MARC metadata news RDA Semantic Web social software standards taxonomies UDC 41