INFORMATION RETRIEVAL FOR JUVENILE AND YOUNG ADULT FICTION Laura Christopherson May 5, 2003 INLS 172

Similar documents
Rikki-tikki-tavi and Plot packet

Unit 2 The Wonderful Wizard of Oz

Three Watson Irvine, CA

MAYWOOD PUBLIC SCHOOLS Maywood, New Jersey. LIBRARY MEDIA CENTER CURRICULUM Kindergarten - Grade 8. Curriculum Guide May, 2009

The Genrefication of an Elementary School Library

Summer Reading 2017 David E. Owens Middle School New Milford, New Jersey

Amazon books Kids Love Books

Title: Genre Study Grade: 2 nd grade Subject: Literature Created by: Synda Tindall, Elkhorn Public Schools (Dec. 2006)

Oh Boy! by Kristen Laaman

Santa Cruz Catholic School Summer Reading and Math Assignments

Dear Rising Eighth Grade Students,

Summer Reading 2018 David E. Owens Middle School New Milford, New Jersey

REVISION PAPER for FINAL TERM EXAM GRADE 5 ENGLISH LANGUAGE. Section A. Rikki-tikki from The Jungle Book by Rudyard Kipling (Excerpt)

Summer Reading 2016 David E. Owens Middle School New Milford, New Jersey

Narrative Paragraphs

Cambridge International Examinations Cambridge Primary Checkpoint

Library Supplies Genre Subject Classification Label

*Theme Draw: After you draw your theme in class, find and circle it below. *THIS THEME WILL BE THE FOCUS OF ALL THREE PARAGRAPHS OF YOUR ESSAY

Purpose: SAMPLE. #5 Knowing the laws of Truth is not enough. A person must live the Truth he/she knows.

PARCC Literary Analysis Task Grade 3 Reading Lesson 2: Modeling the EBSR and TECR

LIBR 53 Treasure Hunt #1 (50 pts) Due 9/19/18

THE IMPORTANCE OF READING ALOUD TO YOUR CHILD. McCrary Elementary Melissa Belote Jessica Hartong Rebecca Kidd Karen Young

BOOK REPORT ENGLISH DEPARTMENT R. LACOUMENTAS

Latin Roots. Center of the Earth. Spelling Words. ject. scrib or scrip. spec. rupt

Summary. Name. The Horned Toad Prince. Activity. Author s Purpose. Activity

Fiction Access Points across Computer-Mediated Book Information Sources:

MAKE WRITING VISUAL &VIVID. Teaching Students to. David Lee Finkle

Predicting Story Outcomes

A Teaching Guide for Daniel Kirk s Library Mouse Books

Performance Reports Theatre 1-2

L. Frank Baum s The Wonderful Wizard of Oz as illustrated by W.W. Denslow and

When I ve earned this badge, I ll know how to write different kinds of stories both true tales and ideas from my imagination.

Summer Reading 2016 David E. Owens Middle School New Milford, New Jersey

My Historical Figure:

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

High Frequency Word Sheets Words 1-10 Words Words Words Words 41-50

In this activity, students read and put summary sentences in order to summarise the story. They can work on their own, in pairs or in groups.

1. WHICH BOOK(s) should I read? Please look closely at the pages that follow. You will see that certain books are for certain grades.

Unit 7: Social Literacy: Function: Responsibility & Stewardship

XSEED Summative Assessment Test 1. Duration: 90 Minutes Maximum Marks: 60. English, Test 1. XSEED Education English Grade 3 1

Correlation to Common Core State Standards Books A-F for Grade 5

Discovery has become a library buzzword, but it refers to a traditional concept: enabling users to find library information and materials.

Mrs. Staab English 134 Lesson Plans Week of 03/22/10-03/26/10

Feelings & Fears. Kids Activities

About This Book. Projects With Pizzazz includes ideas for 39 student projects. Each project is divided into the following

PARCC Narrative Task Grade 6 Reading Lesson 2: Narrative Reading Strategies

This Native American folk

THE RANDOM HOUSE BOOK OF POETRY FOR CHILDREN BY JACK PRELUTSKY DOWNLOAD EBOOK : THE RANDOM HOUSE BOOK OF POETRY FOR CHILDREN BY JACK PRELUTSKY PDF

PARCC Narrative Task Grade 8 Reading Lesson 4: Practice Completing the Narrative Task

Chapter Two - Finding and Evaluating Sources

BIBLIOGRAPHIC INFORMATION: (2011). State library of Kansas. Retrieved from

alphabet book of confidence

SUBJECT DISCOVERY IN LIBRARY CATALOGUES

REVIEW OF THE MANDATORY DAYTIME PROTECTION RULES IN THE OFCOM BROADCASTING CODE

Where would you like to go this summer?

KENDRIYA VIDYALAYA TPKM MADURAI WORK SHEET - ENGLISH CLASS: II TOPIC: ZOO MANNERS ROLL NO.:

Aloni Gabriel and Butterfly

Thank you for being a wonderful student! I hope you have a fun and safe vacation! Sincerely, Mrs. Garcia

Promoting a Juvenile Awards Approval Plan: Using Collaboration and Selected Projects for Improved Visibility and

not to be republished NCERT Why? Alice in Wonderland UNIT-4

Chapter 3 sourcing InFoRMAtIon FoR YoUR thesis

Get out a highlighter

Library 101. To find our online catalogue, Discover from the HSP home page, first see Collections then Catalogues and Research Tools.

Table of Contents. alphabet review: letter order, letter recognition, letter sounds... page 16, 22

For Incoming 3e International Section

Selection Review #1. A Dime a Dozen. The Dream

Alice's Adventures In Wonderland

Excerpts From: Gloria K. Reid. Thinking and Writing About Art History. Part II: Researching and Writing Essays in Art History THE TOPIC

RISING 6 TH, 7 TH, 8 TH GRADERS Group: ESL RECOMMENDED ENGLISH READING LIST SUMMER GUIDELINES: English as-a-second Language

A GOOD READ LEARNING OUTCOMES BADGE REQUIREMENTS. Guards & Rangers - a good read badge

Suffolk Young Authors

Explorers 6 Teacher s notes for the Comprehension Test: Treasure Island

Literature Links. Reading Skills

Curriculum Guide for 4th Grade Reading Unit 1: Exploits 6 weeks. Objectives Methods Resources Assessment the students will

NoveList and NoveList Plus Overview

Write-Around the Room! 2 National Sweepstakes 7 Magazine Research 11 Striking It Rich! 14 My Gradebook 18 Net Wise 22 Surf the Net 27 Explore with

Unit 10 I ve Got My Flocab

WOODSTREAM CHRISTIAN ACADEMY Upper School English Department SCHOOL OF LOGIC & SCHOOL OF RHETORIC 2018 SUMMER READING PROGRAM

Grade 4 Overview texts texts texts fiction nonfiction drama texts text graphic features text audiences revise edit voice Standard American English

Glendale College Library Information Competency Workshops Introduction to the Library for New Students

CHAPTER ONE. The Wounded Beast

Reading Motivation Techniques

Way Original idea Paraphrased idea. Successful people are perseverant to achieve their goals.

Interview with Patti Thorn, co-founder, BlueInk Review. For podcast release Monday, August 4, 2013

We read a story in class from Whootie Owl's Test Prep Storytime Series for Level 2

ebooks at the Library Kindles

ENTRY LEVEL CERTIFICATE STEP UP TO ENGLISH Gold Step 5973/2

The Id, Ego, Superego: Freud s influence on all ages in the media. Alessia Carlton. Claire Criss. Davis Emmert. Molly Jamison.

ARTS AND MEDIA. Teacher s notes 1 FREE YOUR BOOKS TALKING BOOKS

Novel Units Single-Classroom User Agreement for Non-Reproducible Material

Survey on Electronic Book Features

ENG 221 Children s Literature Winter 2018 Tentative syllabus

AP Literature & Composition Summer Reading Assignment & Instructions

Shady Grove Middle School. Summer Reading Packet Grade 8

Lesson Objectives. Core Content Objectives. Language Arts Objectives

Incoming 11 th grade students Summer Reading Assignment

Grade 9 Final Exam Review. June 2017

World Study Guide Literature Series The Wonderful Wizard of Oz Suggested ages Created by: Susan Williams & Katherine Reader.

BORN ON THE THIRD OF JULY BY MICHAEL L. EADS DOWNLOAD EBOOK : BORN ON THE THIRD OF JULY BY MICHAEL L. EADS PDF

Catcher In The Rye Prestwick House

Transcription:

INFORMATION RETRIEVAL FOR JUVENILE AND YOUNG ADULT FICTION Laura Christopherson May 5, 2003 INLS 172 llchrist@email.unc.edu INTRODUCTION Currently, if one wishes to search for something good to read, the available retrieval systems, such as card catalogs, online public access catalogs, and online bookstores such as Amazon.com, do not truly respond to the information need at hand. The library s online public access catalog (or OPAC) based on the old card catalog system erroneously assumes the user already knows what specific fiction book he/she wants to read. It is not designed to the assist the reader with finding something good to read. The system allows users to search for author, title, subject, and keywords all of these features requiring from the user either a specific, known book or a known subject area, possibly nonfiction in origin. The OPAC s purpose seems to be one of a online map of the library itself as evidenced by its ultimate function of directing the user to the appropriate physical location of the book(s) within the library building rather than by helping the reader find a good book. Although Amazon.com does not proclaim itself as an information retrieval system, it is, inadvertently, a source for satisfying an information need that will hopefully result in a sale for Amazon. By providing book-searching capabilities, Amazon hopes to further its true goal of making sales. The reason why I draw Amazon into this discussion of IR systems for fiction is because Amazon offers features that provide greater assistance in answering the question of finding something good to read than the standard library OPAC. One feature is the use of solicited customer reviews as well as editor reviews of books. The other helpful feature is the ability to Look inside this book to view table of contents and excerpts from the book. Designing an information retrieval system that addresses the information needs of fiction seekers requires a look at the research available on how users express such an information need as well as the problems with current IR systems. Once determination of the appropriate aids in addressing fiction needs has been established, a proposal can be made for the design of a new fiction retrieval system that would offer the kinds of assistance a user needs to find something good to read. This new system would be able to [construct] a discourse, a way of interacting and perceiving the aboutness of documents (Iivonen & Sonnenwald, 1998) as it relates to the specific information need in question. THE PROBLEM The main problem with current fiction retrieval systems is that the technologies used do not lend themselves to the idiosyncracies of fiction and the imprecise information needs of a fiction seeker. In fact, no account of the anomalous state of knowledge or ASK (Belkin, Oddy, & Brooks, 1982) that necessarily accompanies the fiction information need is considered at all. For example, sometimes when adults or children are looking for something good to read they do actually have a specific book in mind. They might be drawn to a particular book by a recommendation from a friend that shares the same reading interests, from having previously enjoyed the author s other works, or from a professional review and so forth. However it can be assumed that many people who want something good to read have no clear idea, at the outset,

what specific book would satisfy that desire. The desire itself is not often expressed in the same concrete terms as the information need for fact is. As Sharon Baker (1988) noted, various studies have confirmed that many library patrons select their materials by browsing to locate a nonspecific item. It is logical to assume that before the person asks a friend for a recommendation, he/she was in an anomalous state of knowledge about what particular book he/she wanted to read. The same can be said for the reader making selections based on a professional review or from having previously read and enjoyed a certain author s work. The reader has some gut-level idea of what he/she wants, not fully formed and maybe not fully articulated since every book tells a different story and therefore pinpointing the exact plot or theme of a perceived good book would be impossible. There is also an inherent lack of information about the work (Baker, 1996) at the time of selection. But on some level, the user recognizes a need for something good to read and might have some clues as to how to find that by assuming a book by a favored author may prove enjoyable or by having interest peaked by a friend s recommendation or a professional review. Conventional retrieval systems when used for fiction do not pose the questions What would you like to read today? What qualities and characteristics of a story are you looking for? Online catalogs require users to know what they want before they begin a search (Jacobsen, 1998). The system asks for an author and title which assumes the user knows what author and title he/she wants to read and that there is no ASK on the reader s part; or the system asks for keywords and subject headings which assumes the reader has a particular topic of interest in mind, possibly nonfiction in origin. Fiction retrieval systems must accommodate searching behavior instead of replicating the card catalog structure (Jacobsen, 1998). As much as fiction retrieval presents challenges for adults, fiction retrieval for children presents a new set of problems. As Solomon (1993), Chen (1993), and Jacobsen (1998) noted, children express their information needs in terms of natural language but the OPAC does not support such an expression. Being able to discuss or get at the ASK of the fiction seeker requires the invitation of a natural language statement from the user about what he/she is looking for in a book. In Marie Rankin s (1944) study of procedures used by children in the selection of books of fiction, she quoted expressions of need such as: I like stories about frontier days, a modern theme, or a preference for no babyish books. Rather than requiring the user to adjust the expression of his/her information need to match the OPAC, to conform (Solomon, 1993), the system should encourage and support the natural tendencies of the fiction seeker. Current fiction systems rely on recall knowledge rather than recognition. Children, as well as adults, find it easier to recognize information presented to them than to recall it from memory. (Borgman, Hirsch, Walter, & Gallagher, 1995). So in the case of the fiction seeker who does not have a predetermined author or title in mind, the current systems are asking them to recall knowledge they simply do not have. Readers do not come to the library with a list of authors they like (Baker, 1996). Additionally as Moore, St. George (1991) and Borgman et al. (1995) noted, current OPACs require users to match keywords and topics entered into the system with the predefined subjects set by the library s classification system. Most users and particularly children will not have the personal vocabulary to exactly match the controlled vocabulary of the 2

subject classifications; and so searching by subject and keyword can be a guessing game. Furthermore, keywords provide no context (Borgman et al., 1995). There is no clue to the user as to how his/her chosen keyword fits into the stories that have been returned on the results screen. Implying some association between terms and the thrust of the book is incumbent upon the user. Although in some instances, a controlled vocabulary as a feature of an IR system improves performance, it is probably inappropriate for a fiction retrieval system. The very nature of fiction is fluid; it operates on multiple levels. It is hard to nail down the aboutness of a work of fiction simply because what one book means to one person may not mean the same to the next person. What I perceive the theme of the story to be may be completely different from what another perceives the theme to be. There are some components of a book that can be nailed down into concrete terms such as setting, main character s name, time period, or suitable keywords appearing in the text itself; but the overall thrust of the book is personally experienced and so a controlled vocabulary is of no use. As Iivonen and Sonnenwald (1998) noted, It [controlled vocabulary] also restricts alternative ways of talking about topics. This standardization that Iivonen and Sonnenwald speak of restricts the fluidity of fiction and the non-specificity of the fiction information need. Other problems ensconced in current fiction retrieval systems include the requirement upon the user to type, spell, and punctuate correctly. As Chen (1993) noticed when studying high school students and Jacobsen (1998), Solomon (1993), and Borgman et al. (1995) noticed when studying younger children, young users have difficulty with spelling, typing, punctuation, and syntax (as do some adults). Children also have weak alphabetizing skills (Jacobsen, 1998; Solomon, 1993) therefore making the navigation of subject headings difficult. Boolean logic is difficult to master for the casual user and even more so for children. Understanding how to search with Boolean logic requires an effort on the part of the user to learn the syntax specific to the system s use of Boolean logic. Solomon (1993) noted in his study of OPACs that What was missing was assistance by the system in making the most of this exploratory behavior by, for instance, offering suggestions for recovery from errors or leading children to success. Baker (1996) noticed that a significant majority of library patrons are reluctant to ask for advice in choosing fiction where 84 percent of browsers did not ask library staff for help. During Solomon s study (1993), he noticed that many children denied help when offered it. Therefore, system help, which is not currently available in most OPACs, would be a wise addition. By consulting the system for help, young readers would not risk that adolescent embarassment by asking a librarian for help; and those adult readers, that Baker examined, would not feel that they were bothering busy staff (Baker, 1996). Lastly, both adult s and children s reading material are lumped together in the same system. This means that a child (juvenile, young adult) searching for a work of fiction may be presented with inappropriate results. As well, an adult may be presented with results that he/she feels are below his/her reading level or interest. Overall, each type of user, adult or child, would be presented with possibly more results (more information) than can be digested. Information overload is a greater concern when these materials are mixed. 3

THE INTERVIEWS To determine what type of system would be most desired by juvenile and young adult readers as well as the types of features that would support such retrieval needs, I interviewed three older children: two juveniles aged 11 each and one young adult aged 15. Initially I created three possible retrieval access points and presented them to my interviewees. (1) Keyword/Genre Cards The first access point was one based on the Science Library Catalog and the Kid s Catalog idea of presenting subject area icons to children for their selection (focusing on recognition rather than recall). I examined reviews on Amazon.com for seven juvenile and young adult books: A Wrinkle in Time, Madeleine L Engle The Blue Sword, Robin McKinley Harry Potter boxed set of four books, J.K. Rowling Silverwing, Kenneth Oppel The Golden Compass, Philip Pullman A Long Way from Chicago, Richard Peck Holes, Louis Sachar My intention in examining these reviews was to determine what features or characteristics were prized highly by readers of juvenile and young adult fiction. I wanted to know what emotional responses these readers had to the books, so that I could represent such features and responses in the icons I would present to my readers. The most frequent characterizations included the use of adventure, funny, mystery, mysterious, exciting, action. Specific comments included: This book has mystery and a lot of action. an 11-year old reader of Holes I would definitely recommend this book to anyone who feels like reading a very thrilling adventure story. Someone who enjoys reading books set in the late 1800 s with tons of humor would love to read this. an 11-year old reader of A Long Way from Chicago fascinating, exciting, and suspenseful a 14-year old reader of The Golden Compass the suspense and the adventure and the close calls a 10-year old reader of Silverwing Wizards, magic, imaginary worlds a reader of Harry Potter and that begins an adventure like no other a 12-year old reader of The Blue Sword Anybody who enjoys a constant adventure and action should reel in your pole and see if you catch this thrilling book a 12-year old reader of A Wrinkle in Time 4

From these reviews and from my personal experience with the library I planned to use (my list of 15 books that would be used as search results for the interviews), I created nine keyword/genre cards: Scary Stories Mysterious Stories Adventure, Action, Journeys Animals Real Life Stories In the Old Days Fairy Tales Magic Funny Stories (Please see icons at the back of this paper.) The list of books in the library included all the ones for which I examined reviews with the exception of A Wrinkle in Time. The remaining nine books included: Ella Enchanted, Gail Carson Levine Mrs. Frisby and the Rats of NIMH, Robert C. O Brien King of Shadows, Susan Cooper The Hobbit, J.R.R. Tolkien James and the Giant Peach, Roald Dahl Beauty, Robin McKinley The Fellowship of the Ring, J.R.R. Tolkien Little Women, Louisa May Alcott Being Dead, Vivian Vande Velde I grouped books under each keyword/genre card. Some books were appropriate under more than one card. For example, Harry Potter was listed under Magic; Mysterious Stories; Scary Stories; and Adventure, Action, Journeys. (2) Feature Selection The next access point consisted of a list of features for the interviewees to choose from. Main options included the list of Keywords as represented by the keyword/genre cards; Main Characters such as selecting from people, animals, and mystical creatures; Setting such as time and location; and Series, if the book was a part of a series or had a prequel or sequel. (Please see the feature selection sheet at the end of this paper.) (3) Natural Language Input The last access point was to have each interviewee write down on a slip of paper what they would like in a book, what features, ideas, characteristics appealed to them. I began asking each child if they could go to the library computer and the computer asked them if they would be interested in seeing those books that were most searched for that day, would they be interested? Then I asked, if the library computer offered to display those books that were top rated by other children, would you be interested? I then asked each child how old he/she was. Question (1)Top searches? Answer The two 11-year-olds said yes. The 15-year-old said no. 5

(2)Top reviews? All said yes. The 15-year-old was particularly interested in critical reviews rather than peer reviews. Then I began laying out the keyword/genre cards and asking the children (individually) to select the cards that interested them. Based on their card selection, I then laid out those book cards that corresponded to the keyword/genre card. For example, if the Real Life Stories card was selected, I laid out book cards for A Long Way from Chicago, Holes, King of Shadows, and Little Women. Book cards included a color picture of the book cover on the front of the card (the only part the children saw). Keywords/genres, character info, setting, and series information was on the back as well as a pocket for both a summary of the book (usually obtained from the book jacket) and an excerpt. As Pjetersen (1986), Rankin (1944), Moore and St. George (1991) all noted, the cover of the book had meaning for children when selecting works of fiction. Children perceived the book covers of documents as a means of communicating their contents to the reader and therefore evaluated the book contents on the basis of information from the cover (Pjertersen, 1986). The only interview subject that was not interested in selecting from the cards was the 15-yearold. He said he was not interested in genre. He was more interested in what critics say about a book and what the overall message/summary of the book was. For the two 11-year-olds, I then displayed resulting book cards and asked them to point to those book cards that peaked their interest. For those book cards selected, I asked each child if they would like to hear a summary and/or an excerpt from the book. Rankin (1944) noticed that 47 per cent, said that they [children] read parts of the story when choosing a book to read throughout. Solomon (1996) suggests making book summaries available via an audio clip. To simulate this, I read both summaries and excerpts when asked. It is my suspicion that summaries and excerpts help users identify the main thrust or theme of the book more easily; and so to provide that information is to empower the user to more sensitively satisfy his/her information need. More of the purpose in sampling the text is to determine factors related to the theme of the story (Rankin, 1944). Following a summary and/or excerpt, I asked the child whether he/she would be interested in learning more about the book (setting, characters, series information). In all cases, the answer was no. The three children were satisfied with what they learned about the book from the summary and excerpt. And all three children at various points during the interview, when having selected a book card, wished to hear a summary first and foremost and then possibly an excerpt. A number of book results for each retrieval method were identified by the children as good books (because they had been read previously) or as definitely interesting and worth reading in the future. Following the introduction of the keyword/genre cards, I asked each child to fill out the feature selection sheet. The 15-year-old only selected two items on his feature selection sheet. One of the 11-year-olds selected 13 features. The other 11-year-old selected 18 features. The total number of features per sheet was 52: 15 keywords/genres, 10 people main characters, 10 animal main characters, 7 mystical main characters, 3 generic time periods, 3 generic locations, 2 possible answers to each of the 2 series questions. 6

Again, after submitting the feature selection sheets, I presented each child with resulting book cards. Summaries, excerpts, and further information were offered. Next I asked each child if they could type on a computer and did they know how to use a computer for basic tasks. Each child said yes. Then I presented a small blank piece of paper and a pencil to each child and asked them to write down their thoughts about the characteristics or features of a book that would most interest them. The 15-year-old wrote, The book carries both a strong plot and something of an external message, something that is drawn from reading the book, and is aplicable to ones life. Also contains a variety of humor, sadness, etc. and is well written. You will notice the misspelling of applicable and the lack of apostrophe in one s. One of the 11-year-olds wrote, Diary entrys of the past. Again, note the misspelling of entries. The last 11-year-old wrote, fiction <skip line> mysterious <skip line> some funny parts <skip line>magic <skip line> adventure. She underlined mysterious, funny, and magic. I then asked each child to order their most preferred to least preferred method: keyword/genre cards, feature selection sheet, or natural language input. All three ranked them similarly with the most preferred being natural language input writing down what they wanted, the second preferred being the feature selection sheet, and the least preferred being the keyword/genre cards. The 15-year-old said, however, that he felt the feature selection sheet and the keyword/genre cards were tied for last place since he did not feel they allowed him to truly express his information need. Thus the ability to express a fiction information need in natural language terms is the preferred method. Although the last 11-year-old essentially wrote down keywords, she indicated greater comfort and higher preference with the option of writing out her own expression of what she wanted rather than selecting from a group of icons. THE PROPOSAL The ideal fiction retrieval system for juveniles, young adults, and readers of such works of fiction should accommodate age and developmental stage, learning style, and skill level (Busey & Doerr, 1993). There needs to be multiple access points based on such factors (Solomon, 1996; Busey & Doerr, 1993; Jacobsen, 1998). Younger children who can not type or spell or punctuate with ease may need a mode of access that requires them to click to make selections, similar to the keyword/genre cards or the feature selection sheet which simulate the point-and-click interface. Older children or those who feel comfortable with typing and spelling need a keyboard mode of access similar to the natural language input method I screened for in the interviews. However a choice of all options by all different ages is probably warranted with those more age appropriate access points being highlighted. The keyword/genre icons create a browsing mode that, as Baker (1996) noted, is a common mode of user interaction with fiction. A browsing mode also allows a child to explore the database without having a specific objective as well as to search for a particular desired item (Walter et al., 1996). Browsing allows users to give light to their anomalous state of knowledge and hopefully transform the anomalous to the inspired. I recommend three access points: 7

A browsing mode where children can select from icons similar to the keyword/genre cards. A feature-specific browsing mode where children can select from core features of a story such as main character(s) approximate age/gender/species, location, time period, keywords, etc. A flexible natural language input field where thoughts and feelings about what the user is looking for in a book (what ideas, emotions, characteristics, features, etc. are most appealing) can be entered via a keyboard. Perhaps the system could begin by asking the child his/her age. (In honoring the child s privacy, no data about the child would be stored.) Then if the child is under 10-years-old, he/she could be presented with access point 1 as the main area of real estate on the screen with smaller options at the bottom of the screen for access points 2 and 3. For children between the ages of 10 and 13, presentation of access points 1 and 2 could serve as main choices while access point 3 would assume a smaller place at the bottom of the screen. And children over 13 and adults could be presented with access point 3 as the main area of real estate on the screen with smaller options for 1 and 2. Other components of the system would include - a Help system that offers hints and tips along the way as well as a guided recovery process when errors are made, - dictionaries and spellcheckers to assist budding spellers (similar to Google s Did you mean this? ) - flexible vocabulary processing including thesauri that purposely include kid language such as cross-references of doggie and puppy for the word dog, - flexible punctuation and syntax processing, - an invitation to review the most searched-for items of the day, - an invitation to review the most highly rated books, - the ability for children and librarians to submit reviews of books. Each book would be treated as a frame where certain zones representing concepts that illuminate the aboutness of the book would be populated to form a complete picture of the book. Using a metadata structure helps to shape the features of a book into an operable pattern. The metadata scheme would include information such as: - title - author - illustrator - editor - publish date - publisher - number of pages - series is the book in a series or does it have a prequel or sequel - call number - keywords - similar to the keyword/genre cards - main characters - time period - location 8

- emotional experience (Pejtersen, 1986) such as funny, exciting, sad - action - link to sample cover art image - link to sample audio summary file - link to sample audio excerpt file - links to reviews by children and librarians Initially it might be assumed that using this metadata structure would require a librarian to read each book and enter data into each zone. The librarian would need to be sensitive to the subjective nature of fiction-interpretation and would need to generalize and try to account for all possible perspectives. This is time-consuming and labor-intensive. Perhaps with the help of natural language processing the time and labor for such a task can be greatly reduced. Certain zones can be identified as those that will be compatible receptacles for automatically harvested data. By automating the harvesting of data, the librarian s task will be less daunting. This pre-coordinate approach will also improve the performance of the IR system itself. Conversely, natural language processing will re-express the child s natural language query to correlate it with the known universe of characteristics of the book. So data can be harvested from the text for inclusion in the metadata scheme the metadata scheme to be used to respond to a query and the query itself to correlate the information need with the available data. Additionally books that have been evaluated through natural language processing can be used as a training database to further characterize newly introduced books with discovered similar concepts. The system can learn from known characteristics and improve the performance time of the pre-coordinate text harvesting. To test the possibility of this (and since I do not have access to a natural language processing system), I hope to use term frequency data to infer how natural language processing of works of fiction can aid in the harvesting process. I ran three texts through a 30-day free trial of Concordance. It is a system that calculates term frequencies. (Please note that I did not include an extensive stop-list excluding words such as like, out, up, very, will, etc.) The three texts were: Rikki-Tikki-Tavi, The Jungle Book, Rudyard Kipling short story The Wizard of Oz, Frank Baum a short novel Little Women, Louis May Alcott a full-length novel (Copies of the first ten pages of each report are included in the back of this paper.) The zones that may prove to be the best receptacles for NLP-harvested data are title, author, illustrator, editor, publish date, publisher, main characters, time period, location, emotional experience, action, and keywords. Possibly, the most difficult zones to fill will be time period, emotional experience, action, and keywords. To see what we can achieve with simple term frequencies, I will start with main characters. 9

In all three books, the term that held the greatest frequency was the name of the main character: Rikki-Tikki ( Rikki-Tikki-Tavi ) 79 times Dorothy (The Wizard of Oz) 347 times Jo (Little Women) 1218 times In Rikki-Tikki-Tavi the next set of names mentioned at the top of the list are all the other main characters. These characters are not as important as Rikki-Tikki, but their roles in the story are rather large: Rikki-Tikki 79 Nag 42 Up 41 Nagaina 35 Teddy s (Teddy) 26 Will 26 Darzee 23 The same is true for the Wizard of Oz and Little Women. Rikki-Tikki-Tavi is an action-packed story. There is a competition among Rikki-Tikki, the mongoose, and Nag and Nagaina, the cobras. It results in two battles between Rikki-Tikki and the two snakes individually. There is danger and excitement in this book. If a user is looking for an action-packed book that has danger and excitement, we will want them to find this story. The data harvested for the emotional experience (danger, excitement) and action (action-packed) fields will be most important in retrieving this book to satisfy the above mentioned information need. In looking at the first 460 words ordered according to frequency, there were several terms that indicated action: bite, fight, jumped, move, jump, killed, run, cried, singing, killing, sang, saved, scratch, strike, stroke, climbed, coiled, flew, fluttered, caught, missed, scuttled, sing, stopped, banged, bitten, break, broke, coil, curled, threw, danced, dragged, dropped, fell, fighting, fly, followed, frighten, grew, grow, hatch, helped hissed, hunt, lashing, lying, motion, moved, rolled, runs, shaken, spun, stole strikes, struck, swayed, swaying, and tricked. Without the evaluation of syntax, context, and style that natural language processing is capable of in determining greater meaning of a document than is available through term frequency, in other words, by simple term frequency alone, we can infer from this list of words that Rikki-Tikki- Tavi is an action-packed story. But does it contain excitement and danger? Again, in looking at the first 460 words, those that are of interest include: dead, death, afraid, teeth, frightened, noise, wicked, angry, quickness, dangerous, is dead, quickly, rage, savage, sorrowfully, terrible, valiant, war, and war-cry. In retrieving more information about the characters involved, the first 460 words include: mongoose (Rikki-Tikki), cobra (Nag and Nagaina), karait (another snake in the story), snake, snakes, snake s, cobras, mongooses, mongoose s, cobra s, man, father, mother, boy (Teddy), bird (Darzee), and birds. The words mother and father also tell us something about the relationships between the characters. If natural language processing were used, it is possible that the context of mother and father could be evaluated to determine whose parents these words refer to. In The Wizard of Oz, frequent words include witch, wicked, wizard, castle, palace, beasts, queen, throne, witches, and creature. These words could indicate the keywords magic and fantasy. Frequent words such as walked, journey, walking, lost, good-bye, travelers, and walk could indicate additional keywords such as journey and adventure. Words that convey emotional 10

experience include wicked, heart, brains, beautiful, afraid, terrible, cried, pretty, courage, cowardly, strange, surprise, kill, power, frightened, powerful, tears, coward, cry, angry, fear, and fierce. These words could indicate that the story has danger, awe and mystery, emotional struggles and the surmounting of obstacles. As with Rikki-Tikki-Tavi, we can learn more about the characters: scarecrow, woodman, lion, witch, girl, Toto, man, wizard, friends, woman, king, soldier, travelers, mice, dog, and beast. Location can be inferred from words such as country, room, Emerald City, house, and Kansas. From simple term frequency alone, the more difficult zones to populate by automatic harvesting, those zones that would lead one to assume that a human, subjective interpretation of the book was needed, were successfully inferred. With the tools of natural language processing available sentence structure evaluation, word sense disambiguation, context evaluation, style discrimination it may be possible to gain more extensive and in-depth knowledge about a particular work of fiction. The system can match ideas, not words (Feldman, 1999). CONCLUSION Because the experience of fiction is subjective, the information needs of fiction seekers are best expressed in natural language. When a reader is looking for something good to read, he/she may have very personal ideas about what that means. The reader may desire a story that is action-packed and exciting. He/she may want a story that is about a hero facing difficult challenges. The reader may want a story that has a unique personal message for him alone. Current fiction information retrieval systems do not invite and can not support such personal, non-specific information needs. Instead they require the user to have satisfied his/her information need prior to consulting the system. Current systems seem to operate more as library roadmaps. A new system needs to be developed. Sensitivity to age, developmental stage, learning style, and skill level would need to be inherent in system operation (Busey & Doerr, 1993) and evidenced in multiple access points. Help components, kid-centric dictionaries and thesauri, and allowance for flexible spelling, punctuation, and syntax would be necessary. More features such as the ability to evaluate reviews, hear summaries and excerpts, and view cover art and illustrations would offer the user more information in determining if the work satisfies his/her information need. This new system would need to welcome the natural language query and be able to digest it in natural language terms. A way of discussing the aboutness of a work of fiction can be better addressed through a new metadata structure that accounts for concepts such as emotional experience, action, main characters, setting, reviews, excerpts, summaries, and graphic representations of cover art and illustrations. Deducing these characteristics through natural language harvesting of the text would reduce the librarian s time and labor as well as pave the way for future machine learning. Natural language processing would harmonize and more appropriately resolve the fiction information need with the representation of the book itself. REFERENCES 11

Baker, S. L. (1988, Spring). Will fiction classification schemes increase use? RQ, 27(3), 366-376. Baker, S. L. (1996). A Decade s Worth of Research on Browsing Fiction Collections. In Kenneth D. Shearer (Ed.), Guiding the Reader to the Next Book (pp.127 144). New York: Neal- Schuman Publishers, Inc. Belkin, N.J., Oddy, R.N., & Brooks, H.M. (1982). ASK for information retrieval: Part I Background and theory. Journal of Documentation, 38, 61 71. Borgman, C.L., Hirsch, S.G., Walter, V.A., & Gallagher, A.L. (1995). Childrens Searching Behavior on Browsing and Keyword Online Catalogs: The Science Library Catalog Project. Journal of the American Society for Information Science, 46(9), 663 684. Busey, P., & Doerr, T. (1993). Kid s Catalog: An Information Retrieval System for Children. Journal of Youth Services in Libraries, 7, 77 84. Chen, S. (1993, Fall). Current Research: A Study of High School Students Online Catalog Searching Behavior. School Library Media Quarterly, 22(1), 33 40. Feldman, S. (1999, May). NLP meets the Jabberwocky: Natural Language Processing in Information Retrieval. Online (http://www.onlinemag.net), 23(3). Iivonen, M., & Sonnenwald, D. (1998). From Translation to Navigation of Different Discourses: A Model of Search Term Selection during the Pre-Online Stage of the Search Process. Journal of the American Society for Information Science, 49(4), 312 326. Jacobsen, L. (1998). How Children Search. In Sharon Zuiderveld (Ed.) Cataloging Correctly for Kids: An Introduction to the Tools, 3 rd ed., (pp. 48 52). Chicago and London: American Library Association. Moore, P.A., & St. George, A. (1991, Spring). Children as Information Seekers: The Cognitive Demands of Books and Library Systems. School Library Media Quarterly, 19, 161 168. Pejtersen, A.M. (1986). Design and test of a database for fiction based on an analysis of children s search behavior. In Ingwersen, P., Kajberg, L., & Pejtersen, A.M. (Eds.), Information Technology and Information Use (pp.125 146), London, UK: Taylor Graham. Rankin, M. (1944). Teachers College Contributions to Education, No. 906: Children s Interests in Library Books of Fiction. New York: Bureau of Publications, Teachers College, Columbia University. Solomon, P. (1996). Access to fiction for children: A user-based assessment of options and opportunities. In Albrechtsen, H., & Beghtol, C. Fiction, OPACs, Networks Proceedings of the 1 st Research seminar on Electronic Access to Fiction, Multicultural Knowledge Interaction, and Communication of Culture via Networks. Copenhagen, Denmark: Royal School of Librarianship. Solomon, P. (1993). Children s Information Retrieval Behavior: A Case of Analysis of an OPAC. Journal of the American Society for Information Science, 44(5), 245 264. Walter, V.A., Borgman, C.L., & Hirsch, S.G. (1996, Winter). The Science Library Catalog: A Springboard for Information Literacy. School Library Media Quarterly, 24(2), 105 110. 12