Variation in morphological productivity in the BNC: Sociolinguistic and methodological considerations

Similar documents
British National Corpus

Submission guidelines for authors and editors

What is the BNC? The latest edition is the BNC XML Edition, released in 2007.

Digital Editions for Corpus Linguistics

How now, Sir John? Loca2ng social class in Early Modern drama. Heather

English historical corpora: Report on developments in 1996

Estimating. Proportions with Confidence. Chapter 10. Copyright 2006 Brooks/Cole, a division of Thomson Learning, Inc.

Review: Discourse Analysis; Sociolinguistics: Bednarek & Caple (2012)

Write for College. Using. Introduction. Sequencing Assignments 2 Scope and Sequence 4 Yearlong Timetable 6

Term paper guidelines

DICTIONARY CATALOG DEFINITION OF DICTIONARY CATALOG BY

Author Name Co-Mention Analysis: Testing a Poor Man's Author Co-Citation Analysis Method

Sarcasm Detection in Text: Design Document

Research question. Approach. Foreign words (gairaigo) in Japanese. Research question

The semantics of morphology: A frame-based approach. Lea Kawaletz, M.A.

288 ~lu~l~c 1,API, to set forth such questions of theoretical or practical character and the answers given to them.

UWaterloo at SemEval-2017 Task 7: Locating the Pun Using Syntactic Characteristics and Corpus-based Metrics

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections

A Dictionary of Spoken Danish

INFLUENCE OF MUSICAL CONTEXT ON THE PERCEPTION OF EMOTIONAL EXPRESSION OF MUSIC

How does growing up change us?

Authorship Verification with the Minmax Metric

Humanities Learning Outcomes

Text-Mining and Humanities Research

Studies in Language Content Guidelines and Style Sheet for Book Notices, Book Reviews and Review Articles

Digital Editions for Corpus Linguistics: Representing manuscript reality in electronic corpora

A Majority of Americans Use Apps to Watch Streaming Content on Their Televisions

ILAR Grade 7. September. Reading

Deposited on: 21 February 2011

PICTURE DICTIONARY FOR MECHANICAL ENGINEERING

Why t? TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson

FIFTH GRADE. This year our composition focus is on the development of a story.

Comparison of N-Gram 1 Rank Frequency Data from the Written Texts of the British National Corpus World Edition (BNC) and the author s Web Corpus

Stylistic Analysis of the Poem "Woman Work" by Maya Angelou

Dictionary Of Synonyms And Antonyms With Discriminations By Albert C. & Kitchen, Paul C. Baugh

How Does it Feel? Point of View in Translation: The Case of Virginia Woolf into French

This article was published in Cryptologia Volume XII Number 4 October 1988, pp

Critical Discourse Analysis and the Translator

ENGLISH LANGUAGE AND LITERATURE (EMC)

Precision testing methods of Event Timer A032-ET

Time Domain Simulations

BBC LEARNING ENGLISH 6 Minute Vocabulary Suffixes ee, -able/ible, -ness

4 DETERMINERS AND PRONOUNS

Critical Discourse Analysis. 10 th Semester April 2014 Prepared by: Dr. Alfadil Altahir 1

Individual differences in prediction: An investigation of the N400 in word-pair semantic priming

Stance is present in scientific writing, indeed. Evidence from the Coruña Corpus of English Scientific Writing 1

THE STRATHMORE LAW REVIEW EDITORIAL POLICY AND STYLE GUIDE

An Evaluation of Video Quality Assessment Metrics for Passive Gaming Video Streaming

FRENCH IMMERSION LANGUAGE ARTS (FILA) French-Language Film and Literary Studies 12 (4 credits)

Reading: novels Maniac Magee, Tales of a Fourth Grade Nothing, Sideways Stories picture books Technology Smartboard, Document Camera

Styleguide Formal features of academic texts Notes on the writing process

The unit focuses on features of personal record writing. Pupils read a range of biographical and autobiographical texts and write a short biography.

Introduction It is now widely recognised that metonymy plays a crucial role in language, and may even be more fundamental to human speech and cognitio

THE CROSSPLATFORM REPORT

CHAPTER II REVIEW OF LITERATURES, CONCEPTS, AND THEORETICAL FRAMEWORK

Penultimate Check-Up on Election 42: LIBERALS OPENING UP DAYLIGHT?

Mental Spaces, Conceptual Distance, and Simulation: Looks/Seems/Sounds Like Constructions in English

The Mathematical Distinction that exists between Precision and Accuracy:

The Proportion of NUC Pre-56 Titles Represented in OCLC WorldCat

Notes for Contributors

A Level English Language and Literature EXEMPLAR RESPONSES

Reviewed by Charles Forceville. University of Amsterdam, Dept. of Media and Culture

REVIEW PACKET FOR QUARTERLY EXAM #1 ANSWERS 2018

National Standards for Visual Art The National Standards for Arts Education

Kingdom Schools. Boys Intermediate. (Nov. 24 th -28 th, 2012) English Department. Name:

BBC Trust Review of the BBC s Speech Radio Services

LING/C SC 581: Advanced Computational Linguistics. Lecture Notes Feb 6th

AN INTRODUCTION TO BIBLIOMETRICS

Complete each question with a suitable prefix. Then choose five questions at random to ask your partner.

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Searching For Truth Through Information Literacy

Discourse analysis is an umbrella term for a range of methodological approaches that

Susan K. Reilly LIBER The Hague, Netherlands

English Language Arts 600 Unit Lesson Title Lesson Objectives

2 British Theories of Style

Figure 9.1: A clock signal.

Evaluating Oscilloscope Mask Testing for Six Sigma Quality Standards

A Brief Introduction to Stylistics. By:Dr.K.T.KHADER

Middle School Language Arts/Reading/English Vocabulary. adjective clause a subordinate clause that modifies or describes a noun or pronoun

Hybrid resampling methods for confidence intervals: comment

Literary Elements & Terms. Some of the basics that every good story must have

Grade:10 (Upper-Inter) Subject: Literature School Year:

Syddansk Universitet. The data sharing advantage in astrophysics Dorch, Bertil F.; Drachen, Thea Marie; Ellegaard, Ole

Seen on Screens: Viewing Canadian Feature Films on Multiple Platforms 2007 to April 2015

Performance evaluation of I 3 S on whale shark data

Headings: Web sites/design. Web sites/psychological aspects. Use studies/internet

Interdepartmental Learning Outcomes

esss european summer school for scientometrics 2013 Prof. Dr. Hans-Dieter Daniel

Hispanic Studies. (Spanish Language, Culture and Literature)

Sarcasm in Social Media. sites. This research topic posed an interesting question. Sarcasm, being heavily conveyed

Policy for Music. Bitterne C of E Primary School. Headteacher BPS- Andy Peterson. Signed by Chairs of Governors

Consumer Assessment of Baked Breads made with StarchLite. Results Summary Tragon Corporation

SIMSSA DB: A Database for Computational Musicological Research

Automatic Polyphonic Music Composition Using the EMILE and ABL Grammar Inductors *

Analysis of Reference Books in Japanese Public Libraries Regarding their Number of Holdings, Frequency of Use, and Price

Classics and Philosophy

Deep Search Cannot Communicate Callsigns

Affective response to a set of new musical stimuli W. Trey Hill & Jack A. Palmer Psychological Reports, 106,

An XML-based approach to dialectological data: The development of syllabic liquids in Bulgarian. Quinn & Andrew Dombrowski

Looking Ahead: Viewing Canadian Feature Films on Multiple Platforms. July 2013

Transcription:

Variation in morphological productivity in the BNC: Sociolinguistic and methodological considerations Tanja Säily, University of Helsinki 9 October 2009 In collaboration with Dr. Jukka Suomela, Helsinki Institute for Information Technology HIIT

Introduction -ness and -ity Roughly synonymous suffixes Typically form abstract nouns from adjectives: productive productiveness, productivity Sociolinguistics Do men and women use these suffixes differently in present-day English? Methodology Are hapax-based productivity measures valid? Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 2

Material British National Corpus (BNC) 100 million words: ~90% written, ~10% spoken Demographically sampled spoken component (BNC-DS) 4.2 million words from early 1990s Gender known for 88% of the data, social class for 62% (2.6 million words) Written component (BNC-W) 88 million words, 1960s 1990s Gender known for 51% of the data (45 Mw) Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 3

Methods How to measure productivity? Count the number of different words (types) Count the number of words occurring only once (hapax legomena, or hapaxes) - Approximating new words Comparing type counts from subcorpora Normalisation problematic, establishing statistical significance likewise Permutation testing: take samples in random order and see how types accumulate, 1M times Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 4

CEEC - ity types vs. running words 200 m 150 100 f 50 0 p 0.1 p 0.01 p 0.001 p 0.0001 0 200,000 400,000 600,000 800,000 1,200,000 Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 5

Sociolinguistics: Related work Productivity of -ity significantly low in 17 th -century letters written by women Corpus of Early English Correspondence (CEEC), Säily & Suomela (2009) -ity learned, etymologically foreign; women less well educated than men less able to use -ity? Women favour pronouns over common nouns Rayson et al. 1997 (BNC-DS), Argamon et al. 2003 (BNC-W), Säily et al. forthcoming (CEEC) Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 6

Sociolinguistics: BNC-DS Productivity of both -ity and -ness significantly low in women s speech Expected result - Women s style more interactive -ity: difference just about significant -ness: gender difference tied to social class Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 7

BNC-DS - ity types vs. running words 70 60 50 40 m f 30 20 10 0 p 0.1 p 0.01 p 0.001 p 0.0001 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 8

BNC-DS - ness types vs. running words 70 60 50 40 30 20 10 0 m C2+DE f C2+DE p 0.1 p 0.01 p 0.001 p 0.0001 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 9

Sociolinguistics: BNC-W Productivity of -ity (but not -ness) significantly low in women s writing Holds for both imaginative (BNC-W imag ) and informative (BNC-W inf ) texts Result for -ity expected; negative result for -ness requires more research Semantics of -ness? Embodied attribute/trait goes well with interactive writing style - Could also apply to 17 th -century results Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 10

BNC-W imag - ity types vs. running words 700 600 m 500 f 400 300 200 100 0 p 0.1 p 0.01 p 0.001 p 0.0001 0 5,000,000 10,000,000 15,000,000 Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 11

BNC-W imag - ness types vs. running words 1,000 800 m f 600 400 200 0 p 0.1 p 0.01 p 0.001 p 0.0001 0 5,000,000 10,000,000 15,000,000 Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 12

BNC-W inf - ity types vs. running words 1,500 m 1,000 f 500 0 p 0.1 p 0.01 p 0.001 p 0.0001 0 5,000,000 15,000,000 25,000,000 Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 13

BNC-W inf - ness types vs. running words 1,500 m 1,000 f 500 0 p 0.1 p 0.01 p 0.001 p 0.0001 0 5,000,000 15,000,000 25,000,000 Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 14

Methodology: Related work Baayen (e.g., 1993) Category-conditioned degree of productivity P = n 1 /N Hapax-conditioned degree of productivity P* = n 1 /h (or, within the same corpus, just n 1 ) CEEC: hapax accumulation curves (Säily & Suomela 2009) Confidence intervals too wide Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 15

CEEC - ity hapaxes vs. running words 70 60 50 m 40 30 f 20 10 0 p 0.1 p 0.01 p 0.001 p 0.0001 0 200,000 400,000 600,000 800,000 1,200,000 Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 16

Methodology: BNC study BNC-W: hapax accumulation curves More data narrower confidence intervals - Results look similar to type accumulation curves but less significant However, the number of hapaxes does not grow linearly with either corpus size or the number of suffix tokens - Comparing P figures can be unreliable unless the sizes of the subcorpora / numbers of suffix tokens are of a similar magnitude Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 17

BNC-W inf - ity hapaxes vs. running words 600 500 m 400 300 200 f 100 0 p 0.1 p 0.01 p 0.001 p 0.0001 0 5,000,000 15,000,000 25,000,000 Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 18

BNC-W inf - ity hapaxes vs. suffix tokens 600 500 m 400 300 200 f 100 0 p 0.1 p 0.01 p 0.001 p 0.0001 0 50,000 100,000 150,000 Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 19

Conclusion There can be sociolinguistic variation in morphological productivity There seem to be gendered speech styles and writing styles in English (possibly relatively stable over centuries) There is no perfect solution for measuring productivity as of yet Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 20

References Argamon, S., M. Koppel, J. Fine & A.R. Shimoni. 2003. Gender, genre, and writing style in formal written texts. Text 23(3): 321 346. Baayen, R.H. 1993. On frequency, transparency and productivity. Yearbook of Morphology 1992, ed. by G. Booij & J. van Marle. Dordrecht: Kluwer Academic Publishers, 181 208. BNC = The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by Oxford University Computing Services on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/ CEEC = Corpus of Early English Correspondence. 1998. Compiled by T. Nevalainen, H. Raumolin-Brunberg, J. Keränen, M. Nevala, A. Nurmi & M. Palander-Collin at the Department of English, University of Helsinki. Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 21

References (cont.) Rayson, P., G. Leech & M. Hodges. 1997. Social differentiation in the use of English vocabulary: Some analyses of the conversational component of the British National Corpus. International Journal of Corpus Linguistics 2(1): 133 152. Säily, T., T. Nevalainen & H. Siirtola. Forthcoming. Variation in noun and pronoun frequencies in a historical corpus. Säily, T. & J. Suomela. 2009. Comparing type counts: The case of women, men and -ity in early English letters. Corpus Linguistics: Refinements and Reassessments (Language and Computers: Studies in Practical Linguistics 69), ed. by A. Renouf & A. Kehoe. Amsterdam: Rodopi, 87 109. Tanja Säily, Variation in morphological productivity in the BNC 9 October 2009 22