Digging Deeper, Reaching Further. Module 1: Getting Started

Similar documents
Econ 191: Skills Lecture

The Joint Transportation Research Program & Purdue Library Publishing Services

Chapter 1. An Introduction to Literature

The Power of Shared Data and WorldCat & Open Access Ted Fons OCLC

Simple Steps to Effective Library Research :

An assessment of Google Books' metadata

Access to Billions of Pages for Large-Scale Text Analysis

Using Nonfiction to Motivate Reading and Writing, K- 12. Sample Pages

Stylometry. Style. Discriminators. Authorship and. Stylometry. The measurement of style. Used for:

Outline. Why do we classify? Audio Classification

MSc Projects Information Searching. MSc Projects Information Searching. Peter Hancox Computer Science

Figures in Scientific Open Access Publications

Enabling editors through machine learning

Your Two Weeks of Fame and your Grandmother s

VISION. Instructions to Authors PAN-AMERICA 23 GENERAL INSTRUCTIONS FOR ONLINE SUBMISSIONS DOWNLOADABLE FORMS FOR AUTHORS

Finn s Hotel and the Joycean Canon

Embedding Librarians into the STEM Publication Process. Scientists and librarians both recognize the importance of peer-reviewed scholarly

Laurent Romary. To cite this version: HAL Id: hal

Semi-supervised Musical Instrument Recognition

Course Title: World Literature I Board Approval Date: 07/21/14 Credit / Hours: 0.5 credit. Course Description:

Language and Style in Buck

How comprehensive is the PubMed Central Open Access full-text database?

What do you really do in a literature review? Studying the Comparative Politics of Public. Education

arxiv: v1 [cs.dl] 8 Oct 2014

Scientific Style And Format The Cse Manual For Authors Editors And Publishers Eighth Edition

Student Performance Q&A:

California Content Standards that can be enhanced with storytelling Kindergarten Grade One Grade Two Grade Three Grade Four

METHOD TO DETECT GTTM LOCAL GROUPING BOUNDARIES BASED ON CLUSTERING AND STATISTICAL LEARNING

Carolyn Waters Acquisitions & Reference Librarian The New York Society Library

Doubletalk Detection

How to write a scientific paper for an international journal

Visualize and model your collection with Sustainable Collection Services

Start of the LISS panel:

AGENDA. Mendeley Content. What are the advantages of Mendeley? How to use Mendeley? Mendeley Institutional Edition

1 Introduction to the life course perspective. 2 Working with life course data. 3 Familial life course analysis. 4 Visualization.

Writing for APS Journals

This brochure is printed with soy ink and environment-friendly paper.

SIX STEPS TO A PERFECT RESEARCH PAPER

35 Faculty of Engineering, Chulalongkorn University

WHITEPAPER. Customer Insights: A European Pay-TV Operator s Transition to Test Automation

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

Interdepartmental Learning Outcomes

Signal, Image and Video Processing

(Slide1) POD and The Long Tail

APA Style Guidelines

DOWNLOAD PDF 2000 MLA INTERNATIONAL BIBLIOGRAPHY OF BOOKS AND ARTICLES ON THE MODERN LANGUAGE AND LITERATURES

Signal, Image and Video Processing

The decoder in statistical machine translation: how does it work?

Texas Woman s University

SIX STEPS TO THE PERFECT RESEARCH PAPER

Lyrics Classification using Naive Bayes

Charleston Conference Preview Interview with Katina Strauch & Leah Hinds & Tim Bowen, Copyright Clearance Center

Bibliometric glossary

INTRODUCTION TO SCIENTOMETRICS. Farzaneh Aminpour, PhD. Ministry of Health and Medical Education

Where to present your results. V4 Seminars for Young Scientists on Publishing Techniques in the Field of Engineering Science

Practice with PoP: How to use Publish or Perish effectively? Professor Anne-Wil Harzing Middlesex University

Neural Network Predicating Movie Box Office Performance

K-means and Hierarchical Clustering Method to Improve our Understanding of Citation Contexts

National University of Singapore, Singapore,

An Introduction To Scientific Research E Bright Wilson

Battle of the giants: a comparison of Web of Science, Scopus & Google Scholar

The Write Way: A Writer s Workshop

Welsh print online THE INSPIRATION THE THEATRE OF MEMORY:

Research metrics. Anne Costigan University of Bradford

A Visualization of Relationships Among Papers Using Citation and Co-citation Information

INTERLIBRARY LOAN FOR THE REST OF THE STAFF

Approaching Aesthetics on User Interface and Interaction Design

Journal of Undergraduate Research Submission Acknowledgment Form

Nisa Bakkalbasi, Assessment Coordinator Melissa Goertzen, E-Book Program Development Librarian. *Photo credit: M. Goertzen

Abstract. Introduction

The Decline in the Concentration of Citations,

Humanities Learning Outcomes

NYU Scholars for Department Coordinators:

UCSB LIBRARY COLLECTION SPACE PLANNING INITIATIVE: REPORT ON THE UCSB LIBRARY COLLECTIONS SURVEY OUTCOMES AND PLANNING STRATEGIES

How to write & publish your research results

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Preserving Digital Memory at the National Archives and Records Administration of the U.S.

NYU Scholars for Individual & Proxy Users:

Using Library Resources for Effective Online Teaching. Randy L. Miller, Graduate Research Assistance Librarian

EVALUATING THE IMPACT FACTOR: A CITATION STUDY FOR INFORMATION TECHNOLOGY JOURNALS

DRAFT Final version forthcoming in Modern Language Quarterly (December 2017)

College of the Canyons: Introduction to Biotechnology: Custom Labs

Cited Publications 1 (ISI Indexed) (6 Apr 2012)

Full-Text based Context-Rich Heterogeneous Network Mining Approach for Citation Recommendation

Scopus in Research Work

Free english creative writing essays >>>CLICK HERE<<<

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

2018 FREQUENTLY ASKED QUESTIONS

How to Write/Review a Research Paper. BPT Group

CITATION INDEX AND ANALYSIS DATABASES

IMIDTM. In Motion Identification. White Paper

The Definition of 'db' and 'dbm'

Concise Rules Of APA Style (Concise Rules Of The American Psychological Association (APA) Style) Download Free (EPUB, PDF)

Correlation to Common Core State Standards Books A-F for Grade 5

How to Publish A scientific Research Article

User Guide Stand-Alone Metering for OptiPlant

Positively Perplexing E-Books: Digital Natives Perceptions of Electronic Information Resources

INTRODUCTION TO SCIENTOMETRICS. Farzaneh Aminpour, PhD. Ministry of Health and Medical Education

MEMS Mirror: A8L AU-TINY48.4

WILKES HONORS COLLEGE of FLORIDA ATLANTIC UNIVERSITY REQUIREMENTS AND GUIDELINES FOR HONORS THESES

Transcription:

Digging Deeper, Reaching Further Module 1: Getting Started

In this module we ll Introduce text analysis and broad text analysis workflows à Make sense of digital scholarly research practices Introduce HathiTrust and the HathiTrust Research Center à Understand the context for one text analysis tool provider Introduce our hands-on example and case study à Recognize research questions text analysis can answer M1-2

What is text analysis? M1-3 Using computers to reveal information in and about text (Hearst, 2003) Algorithms discern patterns Text may be unstructured More than just search What is it used for? Seeking out patterns in scientific literature Identifying spam e-mail

How does it work? Break textual data into smaller pieces Abstract (reduce) text so that a computer can crunch it Counting! Words, phrases, parts of speech, etc. Computational statistics Develop hypotheses based on counts of textual features M1-4

How does it impact research? M1-5 Shift in perspective, leads to shift in research questions Scale-up to distant reading (Moretti, 2013) One step in the research process Can be combined with close reading Opens up: Questions not provable by human reading alone Larger corpora for analysis Studies that cover longer time spans

Discussion What examples have you seen of text analysis? In what contexts do you see yourself using text analysis? What about the researchers you support? M1-6

Text analysis research questions May involve: Change over time Pattern recognition Comparative analysis M1-7

Hands-on activity F See Handout p. 1 In pairs or small groups, review the summarized research projects available at http://go.illinois.edu/ddrf-researchexamples. Then discuss the following questions: How do the projects involve change over time, pattern recognition, or comparative analysis? What kind of text data do they use (time period, source, etc.)? What are their findings? M1-8

Example: Rowling and Galbraith : an authorial analysis Question: Did JK Rowling write The Cuckoo s Calling under the pen name Robert Galbraith? Would be impossible to prove through human reading alone! comparative patterns Book cover for The Cuckoo s Calling M1-9 Read more: Rowling and Galbraith : an authorial analysis (Juola, 2013)

Example: Rowling and Galbraith : an authorial analysis Approach: Reading led to hunch about authorship Computational comparison of diction between this book and others written by Rowling Statistical proof of authorial fingerprint Read more: Rowling and Galbraith : an authorial analysis (Juola, 2013) M1-10

Example: Significant Themes in 19th Century Literature Question: What themes are common in 19 th century literature? Answering this question requires a very large corpus and an impossible amount of human reading! patterns comparative M1-11 Read more: Significant Themes in 19th Century Literature (Jockers and Mimno, 2012)

Example: Significant Themes in 19th Century Literature Approach: Run large quantities of text through a statistical algorithm Words that co-occur are likely to be about the same thing Co-occurring words are represented as topics M1-12 Read more: Significant Themes in 19th Century Literature (Jockers and Mimno, 2012)

Example: Significant Themes in 19th Century Literature From paper - Figure 3: Word cloud of topic labeled Female Fashion. M1-13

Example: The Emergence of Literary Diction Question: What textual characteristics constitute literary language? This question covers a very large time span! change over time patterns M1-14 Read more: The Emergence of Literary Diction (Underwood and Sellers, 2012)

Example: The Emergence of Literary Diction Approach: Train a computational model to identify literary genres Compare which words are most frequently used over time in nonfiction prose versus literary genres Demonstrated tendency for poetry, drama, and fiction to use older English words M1-15 Read more: The Emergence of Literary Diction (Underwood and Sellers, 2012)

Example: The Emergence of Literary Diction Y axis: Yearly ratio of words that entered English before 1150 / words that entered from 1150-1699 2.5 2.0 1.5 Genre Poetry, Drama, Fiction Nonfiction Prose From paper: graph of diction patterns between genres, using frequency counts M1-16 1.0 1700 1750 1800 1850 X axis: Year

HTRC for text analysis Scanned & OCR-ed Digitized text Computational methods E.g. Word counts, classification, topic modeling Analysis at scale from the digital library HathiTrust Research Center provided tools and services M1-17

HathiTrust Founded in 2008 Grew out of large-scale digitization initiative at academic research libraries With roots in Google Books project Over 120 partner institutions continue to contribute M1-18

HathiTrust Digital Library Contains over 16 million volumes ~ 50% English From the 15 th to 21 st century, 20th century concentration ~ 63% in copyright or of undetermined status Search and read books M1-19 in the public domain

HathiTrust Research Center Facilitates text analysis of HTDL content Research & Development Located at Indiana University and the University of Illinois M1-20

Non-consumptive research Research in which computational analysis is performed on text, but not research in which a researcher reads or displays substantial portions of the text to understand the expressive content presented within it. Complies with copyright law Foundation of HTRC work Other terms: non-expressive use M1-21

Discussion Are you (or your colleagues) currently offering research support for text analysis? How so? Why or why not? What kinds of questions and/or projects does your library handle? M1-22

Workshop outline Follow the research process: Gathering textual data: 2 modules Working with textual data: 1 module Analyzing textual data: 2 modules Visualizing textual data: 1 module Hands-on activities around a central research question & case study example at each step M1-23 Using both HTRC and non-htrc tools

Workshop outline Build skills to engage with text analysis research Covers programming concepts But won t teach you to code! Introduces computational methods But won t delve into all nuances M1-24

Sample Reference Question Question: I m a student in history who would like to incorporate digital methods into my research. I study American politics, and in particular I d like to examine how concepts such as liberty change over time. M1-25 Approach: We ll practice approaches for answer this question throughout the workshop

Case Study Inside the Creativity Boom Researcher: Samuel Franklin Question: How do the use and meaning of creative and creativity change over the 20 th century? Approach: We ll discuss how this researcher approached his question throughout the workshop M1-26 Learn more: https://wiki.htrc.illinois.edu/x/cadiaq

A word of caution Workshop outline suggests research workflow like: Find text Prepare text Apply algorithm Visualize results M1-27

A word of caution Actual research workflow like: Search for text Get access to text Clean text Exploratory visualization Prepare text Apply algorithm Visualize results M1-28

Discussion What are some of the characteristics of a good candidate research question/project for using text analysis methods? M1-29

Questions? M1-30

References Hearst, M. (2003). What is text mining. SIMS, UC Berkeley. http://people.ischool.berkeley.edu/~hearst/text-mining.html Jockers, M. L., & Mimno, D. (2012). Significant themes in 19th-century literature. [pre-print] http://digitalcommons.unl.edu/englishfacpubs/105/. Juola, P. Language Log» Rowling and Galbraith : an authorial analysis. July 16, 2013. Retrieved January 25, 2017, from http://languagelog.ldc.upenn.edu/nll/?p=5315 Moretti, F. (2013). Distant reading. Verso Books. Underwood, T., & Sellers, J. (2012). The emergence of literary diction. Journal of Digital Humanities, 1(2), 1-2. http://journalofdigitalhumanities.org/1-2/the-emergence-of-literarydiction-by-ted-underwood-and-jordan-sellers/. M1-31