Kapi`olani Community College s Kapi`o Student Newspaper Digitizing Project Report

Similar documents
MARCH 23, 2016 NATIONAL MUSEUM OF AMERICAN HISTORY, ARCHIVES CENTER FUNDED BY THE COUNCIL ON LIBRARY AND INFORMATION RESOURCES

FDC020 FHSU Rare Book Collection Metadata Application Profile v1.1

Publishing Your Family History

Transitioning Your Institutional Repository into a Digital Archive

Instruction for Diverse Populations Multilingual Glossary Definitions

Cooperative Cataloging in Academic Libraries: From Mesopotamia to Metadata

Library Terminology. Acquisitions--Department of the Library which orders new material. This term is used in the Online Catalog.

Digitization Project of the Historical Archives of Macao

Migratory Patterns in IRs: CONTENTdm, Digital Commons and Flying the Coop

Georgia Tech Library Catalog

FORMAT & SUBMISSION GUIDELINES FOR DISSERTATIONS UNIVERSITY OF HOUSTON CLEAR LAKE

What Do I Do Next? Resources for Small Archives

The Joint Transportation Research Program & Purdue Library Publishing Services

Migratory Patterns in IRs: CONTENTdm, Digital Commons and Flying the Coop

Texas Woman s University

Nicola Visits the Library. For my library visit, I traveled to beautiful Point Breeze in Pittsburgh to speak with

Leveraging your investment in EAST: A series of perspectives

from physical to digital worlds Tefko Saracevic, Ph.D.

SNHU Academic Archive

POST-SUBMISSION INFORMATION PACKET

Digitization : Basic Concepts

Tuscaloosa Public Library Collection Development Policy

BARC Tips for Tiny Libraries

Preserving Observatory Publications: Microfilming, Scanning...What s Next?

Notes: PACSCL/CLIR Hidden Collections Processing Project, Survey and Processing Plan Worksheet

English 1010 Presentation Guide. Tennessee State University Home Page

Library and Information Science (079) Marking Scheme ( )

DIGITISATION GUIDELINES

Help! I m cataloging a monographic e-resource! What do I need to know from I-Share?

AN ELECTRONIC JOURNAL IMPACT STUDY: THE FACTORS THAT CHANGE WHEN AN ACADEMIC LIBRARY MIGRATES FROM PRINT 1

Archives Boot Camp: Minimal Processing PACSCL/CLIR HIDDEN COLLECTIONS PROCESSING PROJECT

Faculty Governance Minutes A Compilation for online version

COLLECTION SUMMARY. Dates: [dates of collection material; DACS 2.4; MARC 245]

administration access control A security feature that determines who can edit the configuration settings for a given Transmitter.

NJ Record Retention & Disposal

Document Archive Procedures

[Review and Care of archives]

For a number of years, archivists have bemoaned seemingly impossible

EXHIBITS 101. The Basics of How to Curate & Install an Exhibit National Archives Conference for Fraternities and Sororities.

Overview. Project Shutdown Schedule

ACUSCREEN NDT Joaquín González -

PROTECTING THE PUBLIC RECORD IN AN ONLINE ERA. IMPLEMENTING REFERENCE ARCHIVES FOR GOVERNMENT AGENCIES.

ARCHIVAL DESCRIPTION GOOD, BETTER, BEST

Today s WorldCat: New Uses, New Data

Swinburne University of Technology

AR Page 1 of 10. Instruction USE OF COPYRIGHTED MATERIALS

Dead Links? No Problem. We re In This Together

Table of content. Table of content Introduction Concepts Hardware setup...4

Organization and Preservation of Historic Materials in the Archives of the. Michelle Dillon, Project Director: ,

EndNote Menus Reference Guide. EndNote Training

Mainstreaming University Publications: Designing Collaboration Across Library Units for Discovery and Access

DS-575W User's Guide

Preserving Our History: Principles of Archival Conservation

PubMed Central. SPEC Kit 338: Library Management of Disciplinary Repositories 113

Department of Rare Books, Special Collections, and Preservation. Emálee Krulish, Collection Services Library Assistant

Preserving Music Recitals before they fade away

SAA Museum Archives Section Working Group Example. SAA Museum Archives Section Working Group Brian Wilson 05June2012

Catalogs, MARC and Other Metadata

Preservation Programmes at the National Library Board, Singapore (Paper to be presented at the CDNL-AO Meeting in Bali, 8 May 07)

Library Working Hours:

Cataloguing guidelines for community archives

Introduction to EndNote Desktop

Library Handbook

This policy takes as its starting point the Library's mission statement:

Automatically Creating Biomedical Bibliographic Records from Printed Volumes of Old Indexes

New ILS Data Delivery Guidelines

W-FL BOCES SLS. Tips for inventory and weeding. Katherine Hammill, W-FL SLS Coordinator, May 2014

AN ARCHIVAL PRIMER A PRACTICAL GUIDE FOR BUILDING AND MAINTAINING AN ARCHIVAL PROGRAM. by Martha Lund Smalley

Digitising and Documenting Endangered Material: A Tale of Three Projects

INTERLIBRARY LOAN FOR THE REST OF THE STAFF

Missouri Evergreen Cataloging Policy. Adopted July 3, Cataloging Policy Purpose. Updating the Missouri Evergreen Cataloging Policy

The library is closed for all school holidays. Special hours apply during the summer break.

Born Digital Project. of the California Digital Newspaper Collection

NLI Update Elhanan Adler, Marina Goldsmith

SUBJECT DISCOVERY IN LIBRARY CATALOGUES

ICDL FAQS FOR REVISED 3/18/05. What is the International Children s Digital Library (ICDL)? Who is the intended audience for the ICDL?

All Proposers Request For Proposals #108023: Data Scanning Services. Dane County Information Management

Digital Initiatives & Scholar Commons

OPARCH (opinion) Journal of Architectural Education Manuscript Guidelines and Submission Protocols

From Analog to Digital: Changes in Preservation. Gregor Trinkaus-Randall Digital Commonwealth Conference Worcester, MA March 25, 2010

A Finding Aid to the Jerome Wallace Papers, , bulk , in the Archives of American Art

COLLECTION DEVELOPMENT GUIDELINES

CIRCULATION. A security portal adjacent to the Circulation Desk protects library materials and deters accidental removal without checkout.

EndNote Basics. As with all libraries created on EndNote, you can add to, modify, search, sort, and customize at any time.

APM CALIBRATION PROCEDURE Rev. A June 3, 2015

Journal of Phenomenological Psychology. Scope. Ethical and Legal Conditions. Online Submission. Instructions for Authors

Catholic Archives Society Publications

1. Getting started. UH Manoa Libraries. Hamilton and Sinclair Libraries

NEW YORK CHIROPRACTIC COLLEGE LIBRARY HANDBOOK AND POLICIES

Introduction to

Guide to the Dennis Lee Askew Papers

SAMPLE DOCUMENT. Date: 2003

Guide to the Narragansett Times Index, Research Notes and Index

Inventory of the Buchenwald Concentration Camp Photographs, 1945

10.14 The use-case diagram for the library appears in Figure The descriptions of the use cases are shown in Figures 10.4 though 10.8.

Welsh print online THE INSPIRATION THE THEATRE OF MEMORY:

Archon Cheat Sheet. Determine the accession number. Create the Archon Collection Manager record

Digital Collection Management through the Library Catalog

Instructions for the Preparation. of the Master s Thesis

Low-Cost Ways to Preserve Family Archives

Transcription:

Kapi`olani Community College s Kapi`o Student Newspaper Digitizing Project Report Introduction The Kapi`o Student Newspaper Collection comprises the scanned images of paper newspapers loaned to the Digital Initiatives Librarian by the Kapi`o Office, the Library and Learning Resources (LLR) at Kapi`olani Community College, and Mr. Guy Inaba of LLR. Over 750 newspapers were scanned and PDFs of these scans are stored in the online repository dspace.lib.hawaii.edu. Issues are added to this collection when possible. Scope of work The purpose of this project is to create an online, searchable repository of Kap`io newspapers published from 1965 through Spring 2011, Kapi`o s last printing. If possible, we can expand to include issues produced and distributed via the web. Paper issues will be scanned, the images will be converted to text-searchable PDF files (PDF-A if possible), and the PDFs will be loaded into the online database repository built with DSpace. A permanent url will then be provided for each digitized issue. As the library does not have a mission to preserve and archive documents, the DIL minimally processed the paper collections to ensure optimal housing for a regularly accessed newspaper collection, following guidelines published at the Library of Congress website (http://www.loc.gov/preservation/care/newspap.html). The Head Librarian authorized the purchasing of supplies to do this work. The files to be uploaded to the IR were formatted for access and not for preservation. File sizes were kept below 25MB for easier downloading and fair viewing quality. Backup copies of the PDFs and the original uncompressed tiff scans will be kept on at least one PC and an external drive. The plan is to keep the content stable into the unforeseeable future by continually migrating to current hardware and software technologies.

Copyright ownership The Board of Student Publications at Kapi`olani Community College published this newspaper. The Original Collections The Kapi`o Office newspaper collection: With permission from faculty advisor CatherineToth, the paper collection was moved from the Kapio office to the library June 13, 2011. The newspapers are sized so that each page is 11.5 wide by 16 high (each sheet is 23 wide and 16 high) and the collection measured 3 and 4 linear feet (with the newspapers folded in half). The oldest issue is June 2, 1964 and the most recent issue is December 7, 2009, total number of issues is 734, with missing volumes and issues. It is a working collection, in that the newspapers are regularly accessed for research by journalism students and faculty. The paper collection is on indefinite loan to the Digital Initiatives Librarian. At the Kapi`o office the newspaper collection was stored in vertical file cabinets. Many of the folders were ph-neutral. The newspapers were folded once to fit into the folders. Some newspapers are brown and brittle, but it was still possible to handle them. The DIL ensured that these newspapers were re-housed in archival boxes that allowed them to lie flat and not folded. The boxes were lined with acid-free tissue paper and labelled with acid-free labels. Included with the Kapio Office collection were CDs and DVDs with digital versions of issues that were not available in the paper collections we had access to. Two complete issues were found and are added to the digital collection. The DIL hopes to get permission to scan more of the paper issues currently in the care of the Kap`io Office. The KCC LLR collection This paper collection ranged in date from 10/7/1977 to 11/16/2004 and comprised of 308 issues. We found we needed to scan only 15 newspaper issues. The LLR collection was stored in xerox paper boxes, folded once to fit. As a library policy small white labels denoting the issue s date were placed on the upper left corner of each front page. The newspapers were in very good condition. The DIL ensured that these newspapers were rehoused in archival boxes that allowed them to lie flat and not folded. The boxes were lined with acid-free tissue paper and labelled with acid-free labels. A spreadsheet tracking data elements such as date of publication and number of pages was developed for the metadata needed for the online collection. This spreadsheet was also developed into an inventory and finding aid for the physical collection for the LLR.

Mr. Guy Inaba s collection Mr. Inaba s paper collection of 111 issues ranged from 1998 to 2011, was stored in green hanging files and the issues were folded once. The newspapers were in very good condition. 10 issues were not in our collection of scanned newspapers, so they were added to the digitized set. Any issues not incorporated into the LLR collection will be returned to Mr. Inaba. Hamilton Library s collection: The Hawaiian collection was researched via catalog records in the UHM Library OPAC on June 3, 2013. The dates range from 1986 through 2011 but the collection is incomplete, missing volumes and issues. When comparing their holdings up to December 2009, it appears there are about 22 issues that KCC could scan to augment our scanned collection. The DIL hopes to discuss the possibility of scanning these issues in AY 2014. Archival supplies used Clamshell boxes: http://www.shopbrodart.com/supplies/archival-products/boxes/document-boxes/_/drop-spine- Clamshell-Boxes/ $10.65/box ph testing pen: http://www.shopbrodart.com/supplies/archival-products/preservation-supplies/cleaning-andmaintenance/_/ph-testing-pen/?q=ph%2bpen, $5.05/pen Acid-free 3.3 x 4 labels: http://www.shopbrodart.com/supplies/labels-protectors-and-bar-coding/labels/address-andshipping-labels/_/brodart-multipurpose-labels/ SKU 55 392 002 $19.95/box Packing tissue: http://www.shopbrodart.com/supplies/archival-products/preservation-supplies/boards-paperand-tissue/_/acid-free-buffered-tissue/ SKU 38 018 001 $29.90/box Scanning The Kapi`o Office newspaper collection The Kapi`o collection was inventoried, sorted into chronological order, and a set of unique issues was developed for scanning. A spreadsheet tracking data elements such as date of

publication and number of pages was developed to collect the needed metadata to issue a scanning request for proposal and to develop the metadata needed for the online collection. Of the 734 issues, 733 issues were prepared for scanning by an outside vendor. The 1964 issue, unlike the others, was printed on letter-size paper and was therefore scanned in-house by the DIL. The scanning project was competed through CommercePoint per University procurement policy. The Request for Proposal had the following technical specifications: 733 newspapers, 6214 pages, some newspapers are brittle with age. Each page is to be scanned to uncompressed 8 bit grayscale TIFF 6.0 400 dpi, each tiff file named to identify the volume, issue, and page number of the newspaper issue. Most issues are composed of two nested sheets, each about 16" by 23". These sheets are folded in half, so that the reader gets 8 pages. Each page should be scanned individually. Images must be sized and saved at 1:1 scale to the dimensions of the original page. Individual pages must be captured using non-roller feed scanner with "V" shaped cradle, pages are not to be cut apart. "V" shaped cradle allows pages to lay flat for capture without damaging possible fragile creases in paper. Pickup from and delivery of newspapers to KCC library, require that newspapers are scanned on O ahu. KCC Library will provide an external drive on which the TIFFs may be loaded. A vendor presented with the lowest bid. The DIL requested a scan sample to check on image quality and visited the vendor to view the cradle scanner. At a later point the DIL found some irregularities with the scans. The images were not captured at 400 DPI but had been captured at 72 dpi and resized to 400 DPI. The vendor quickly reimaged the collection to the correct specs using different equipment. The scans were done as follows: Scanned 400 dpi uncompressed 24-bit RGB on i2s DigiBook SupraScan Quartz book scanner manufactured by Iimage Retrieval SupraScanQuartzA1 [SN: 320503] - CamQuartz [SN: 320503] using YooScan v 1.2.0. The KCC LLR collection and Mr. Inaba s collection The LLR and Mr. Inaba s collections were scanned in-house by a student assistant to the following specs: Scanned 400dpi uncompressed tif 8-bit greyscale* on Epson Expression 10000XL using EpsonScan 3.49A software.

Scanned by LLR. *In keeping with our request for proposal s technical specifications for out-sourced scanning, we scanned the LLR collection in greyscale. The vendor presented the corrected scans in 24-bit RGB, resulting in a scan collection with two different standards. Total Scanned Collection: Volume and Date Ranges The scanned collection ranges from volume 3 to volume 50, dates 1964 to 2011, with issues missing. Converting the tiffs to multi page PDFs and OCRing 1. PDFs were created with Acrobat X from all the tifs of each issue. 2. The PDFs were then machine-ocr d and care was taken to ensure the PDF remained under 25MB in size. The DIL decided to do computer OCR-ing using Acrobat X, with no manual corrections. The rationale for minimal OCR ing included: a. minimal resources b. interest in making the issues available online with full browsing and limited fulltext searching capability (manual correction would have involved many more months of work), and c. balancing cost vs. benefits of maximum effort OCR-ing vs. the needs of the probable user. d. The issues may, at a later date, undergo manual OCR correction to support better full-text searching. 3. The PDFs were configured so that the initial view is the full page of the first page. Filename Rules 1. All the tifs for one newspaper issue were scanned as single images and are stored in one folder. Each tiff is named with the date and the page of the issue. The tifs for the issue dated January 14, 2003 follow this example: 2003-01-14_001.tif 2. Each folder of tifs is named with the date of the issue: 2003-01-14 3. The OCR d PDFs were labelled with date and volume and issue number. I thought it would be useful to include both publishing date and publishing numbers in the filename. Occasionally a newspaper issue was erroneously assigned duplicate volume and issue numbers, so the date served to clarify the difference between two issues. For example:

kapio-2003.01.14-v36-i13.pdf Why did we use decimals in the date for PDF name vs. dashes in the date for tif name? It seemed the filename would be easier to read if (a) the date and publication number fields use different separators (b) we alternated between the use of dashes and decimals between the data elements (project name, date, volume & issue). File Backups The original uncompressed TIFs and the OCR d PDFs were kept as backups for the collection. The uncompressed TIFs are the closest to digital archival quality and the OCR d PDFs are closest to the final access product. Metadata for Kapi`o General rules Use of Hawaiian diacritics: During the period of metadata preparation, configuration improvements were made to the Dspace software to support more effective searching with diacritics. A search for Kapio now brings up all versions of the term with 4 different versions of okina, including the correctly coded Hex02BB. A search for Kapi`o with the Hex02BB brings up all the versions of Kapi`olani with marks. Based on this, the UHManoa Library metadata librarian suggested that Hawaiian diacritics be applied to metadata to reflect the original s use of diacritics. Therefore when the original does not use diacritics in the title or publisher name, dc.title or dc.publisher terms do not use diacritics. All Kapi`o have diacritics in dc.relation.ispartof as it refers to the name of the office. All Kapiolani in dc.subject.lcsh do not have okina. When diacritics are used in the source document, the terms use diacritics. Metadata element used are 1. dc.title includes three sets of information: the title Kapio, the date of publication, with leading zeros, to force chronological sorting of the titles, and the volume and issue numbers, also with leading zeros. Dates, volume numbers, and issue numbers that were incorrectly assigned were corrected and marked with square brackets. 2. dc.title.alternative is used for issues that show an alternative title 3. dc.date.issued refers to the date of print publication 4. dc.publisher is the publisher of the newspaper 5. dc.type is the type of document 6. dc.type.dcmi is another type qualifier

7. dc.language.iso describes the language of the document, according to the ISO 639 standard 8. dc.subject is used for free-form descriptive terms regarding the aboutness of the item, e.g. students. Text in dc.subject should be lower-case unless it refers to a name or acronym. 9. dc.subject.lcsh is for Library of Congress controlled vocabulary regarding the aboutness of the item. 10. dc.description describes the isness of the item and for this collection is used to point out special issues and errors found in dates and volume and issue numbers. Text in dc.description is in sentence case with no periods required. 11. dc.format.extent is used to denote number of pages and always uses the term pages 12. dc.format.digitalorigin borrows from the MODS metadata scheme and is used to indicate whether the item was produced as a digital document or was scanned from paper, microform, or other media. 13. dc.relation.ispartof is used to indicate in what physical collection the original document may be found. (Note: if I had denoted the exact paper issue in this field, along with the collection information, I would have used dc.source. See Steven Miller s 2011 Metadata for Digital Collections.) 14. dc.rights is used to describe the Creative Commons license assigned to the item 15. dc.rights.uri is the link to the Creative Commons license that covers the item. A master metadata file was developed to include both the metadata that goes into Dspace and the metadata that reflects important processing information (e.g. number of paper copies of an original issue, the scanning technology and specifications used, etc.). The information can be used to generate finding aids for the physical collections. Unique metadata challenges There were many problems with dates, and volume and issue numbers. These include: 1. The date on some of the pages in the issue did not match the date on other pages 2. The volume number was incorrect, or the issue number was incorrect, or both 3. Issue numbers were duplicated or out of sequence (sequence being determined by the date of publication) The UHM Library dspace metadata librarian was consulted and her advice was to change the numbers if it was fairly clear what the correct numbers should be. The following rules and procedures were adopted to handle these problems: 1. If it is a clear duplicate and the correct number is easy to ascertain, correct the number and note the problem and correction in dc.description. For example, if given a series of issue numbers 2, 2, 4, 5, I would change the second 2 to 3. 2. If there was a duplicate issue number and a very long run of numbers afterward, I was concerned that simply renumbering the series might overlook the risk that there

might be a missing issue, causing the renumbering to be incorrect. Unless the newspaper was being published weekly and I could be confident that there were no missing issues, I would correct the numbering. Otherwise I did not re-number the series. 3. Volume numbers, for most academic years, start anew in fall semester and end at the end of spring semester. In some instances volume numbers run for only one semester. I did not change these numbers as I did not want to second-guess what the editor was intending to do with the volume numbers. 4. In the master metadata file, records with problems in publish date, volume number or issue number were highlighted in yellow for easy identification. 5. When there was an incorrect assignment of publish date, I noted the error in dc_description, corrected the pdf filename, which includes the date, corrected the dc_date field, then corrected the original tif names and the name of the folder containing the tifs. 6. As I made changes to PDF and tif filenames, I kept regular backups to ensure all changes were replicated from the working master sets on my PC. 7. If there was an incorrect assignment of publish date or volume or issue numbers, I corrected the number in the title and put square brackets around the number. For example, Kapi o, [2004]-01-13 (vol. 37, issue 16) DSpace Decision on placing the collection under the Board of Student Publications (BOSP) space in DSpace The publication of the student newspaper has been under BOSP s domain during a large portion of Kapi`o s existence. Kapi`o in 2013 remains under BOSP. Currently the BOSP is at the first level of Dspace communities as a major unit at the college. Kapi`o is a community within BOSP. In consultation with the UHM Library metadata librarian, the newspapers were separated into collections defined by volume.

Licensing The Digital Initiatives Librarian authorized the template repository distribution license of each uploaded file. This license authorizes the UH Manoa Library to make backup electronic copies of the collection and to make the copies available on the web. The Digital Initiatives Librarian added a Creative Commons license to each issue to indicate that any content used should be attributed to Kap`io, that the content is not to be used for commercial purposes, and the content should not be used for derivative purposes. The specifications apply to the U.S. (the international option was not available at the time) and are at http://creativecommons.org/licenses/by-nc-nd/3.0/us/. Documentation for Project The inventory of newspaper issues, current to the initial batch-loading of this collection and including the metadata uploaded to Dspace, is in the Excel spreadsheet, Kapio-Metadata- Master_2013.08.21.xlsx and is stored for future reference in the Dspace collection at http://dspace.lib.hawaii.edu/handle/10790/1942. This report, The Kapi o Newspaper Digitizing Project Report, is also stored for future reference in the Dspace collection at http://dspace.lib.hawaii.edu/handle/10790/1942. By Sunyeen Pai, Digital Initiatives Librarian (DIL) sunyeen@hawaii.edu Last updated August 27, 2013