Practice Midterm Exam for Natural Language Processing

Similar documents
winter but it rained often during the summer

Language and Inference

LING/C SC 581: Advanced Computational Linguistics. Lecture Notes Feb 6th

Introduction to Natural Language Processing Phase 2: Question Answering

Characterizing Literature Using Machine Learning Methods

Graphic Texts And Grammar Questions

Longman Academic Writing Series 4

ACT English Test. Instructions. Usage and Mechanics Punctuation (10 questions) Grammar and Usage (12 questions) Sentence Structure (18 questions)

GRADE 11 AND 12 ENGLISH ENTRANCE EXAM

Sentence Processing III. LIGN 170, Lecture 8

In years 3, 4 and 5 children are expected to: Read daily at home. Bring library books back to school every week. If the library book is unfinished,

South Avenue Primary School. Name: New Document 1. Class: Date: 44 minutes. Time: 44 marks. Marks: Comments: Page 1

LESSON 30: REVIEW & QUIZ (DEPENDENT CLAUSES)

4-1. Gerunds and Infinitives

Week 3 10/12/11. Book p Booklet p.26. -Commands can be affirmative or negative. -the subject you is not stated.

Write for College. Using. Introduction. Sequencing Assignments 2 Scope and Sequence 4 Yearlong Timetable 6

Close Reading with Computers: Genre Signals, Parts of Speech, and David Mitchell s Cloud Atlas

Key Stage 2 example test paper

The rude man had extremely dirty finger nails. (1 mark) a) Circle the three words in the sentence above that should start with a capital letter.

Cheap Travel to New York City. There are many ways to economize on a trip to New York City and still have a good time.

tech-up with Focused Poetry

Practice: Editing Rules/Bell Ringer Rules. 3) Since Mr. Alig did not have time to grade, the quizzes have *** on his desk since last night.

Key stage 2. English grammar, punctuation and spelling. Paper 1: questions national curriculum tests. First name. Middle name.

Lauderdale County School District Pacing Guide Sixth Grade Language Arts / Reading First Nine Weeks

Paper Evaluation Sheet David Dolata, Ph.D.

S. 2 English Revision Exercises. Unit 1 Basic English Sentence Patterns

Key stage 1. English grammar, punctuation and spelling. Paper 2: questions national curriculum tests. First name. Middle name.

Lesson 81: Sea Transport (20-25 minutes)

What s New in the 17th Edition

SOL Testing Targets Sentence Formation/Grammar/Mechanics

Key stage 2. English grammar, punctuation and spelling. Paper 1: questions national curriculum tests. First name. Middle name.

LIS 489 Scholarly Paper (30 points)

SAMPLE. Grammar, punctuation and spelling. Paper 1: short answer questions. English tests KEY STAGE LEVELS. First name. Middle name.

English Language Arts 600 Unit Lesson Title Lesson Objectives

LA CAFÉ. 25 August Could I designate a person to set ipad timer for 9:50 every Monday 8A and 10:42 8B?

Language at work Present simple

SAMPLE. Grammar, punctuation and spelling. Paper 1: short answer questions. English tests KEY STAGE LEVELS. First name. Middle name.

Scholastic Aptitude Test: Writing. test basics and testing strategies

The Grass Roots for the ACT English Exam

A computer assisted analysis of literary text: from feature analysis to judgements of literary merit Tess M. E. A. Crosbie

In the texts 1 How many texts are there on pages 76 77? 2 a What text type is The Friday Quiz? Why do you think so?

10 Common Grammatical Errors and How to Fix Them

Punctuation Parts 1 & 2 E N G L I S H 2 1 M S. B R O W N

LESSON 26: DEPENDENT CLAUSES (ADVERB)

U3: B: P20/21: E1 /3 U3: C: P22/23: E1/ 4 U3: P19: E2: V U1: P5: E1: V U3: A: 18/19: E1 /3 U3: C: P22/23: E1/ 4 U13: P97: E4/5: V U3: P19: E2: V

Rubrics & Checklists

The indefinite articles 1. We use the article a / an when we are talking about something for the first time or not specific things.

Basic English. Robert Taggart

LEARNING GRAMMAR WORKBOOK 6 is specially designed to assess and expand the student s usage of grammar in the English Language.

Key stage 2 - English grammar, punctuation and spelling practice paper

n.pinnacle CAREER INSTITUTE C_171 SHAHPURA NEAR BANSAL HOSPITAL

63 In QetQ example, heart is classified as noun: singular, common, abstract Homophones: sea/sea 68 Homophones: sea/see

GRADE 9 FINAL REVISION

6 th Grade ELA Post-Test Study Guide Semester One

in the park, my mum my sister on the swing. 2 In the sentence below, Dad booked the cinema tickets before he collected them.

Language and Mind Prof. Rajesh Kumar Department of Humanities and Social Sciences Indian Institute of Technology, Madras

Strand 6 English Language Arts and Reading

READY-TO-GO REPRODUCIBLES

Part 1: Writing. Fundamentals of Writing 2 Lesson 5. Sentence Structure: Complex Sentences

Submission guidelines for authors and editors

CS 562: STATISTICAL NATURAL LANGUAGE PROCESSING

Grammar, punctuation and spelling

English Olympiad Level 3

Feminist Formations Style Guide. Quick-Reference: MECHANICS

Capitalization after colon in apa Capitalization after colon in apa

METACOGNITIVE CHALLENGES SUMMARY CHART

AO6 Secure Therapy Set 1. Sentences and Punctuation

STEPS TO SUCCESSFUL WRITING

To the Instructor Acknowledgments What Is the Least You Should Know? p. 1 Spelling and Word Choice p. 3 Your Own List of Misspelled Words p.

Reading 1: Novel Excerpt Prepare to Read... 4 Vocabulary: Literary Terms, Academic Words, Word Study Reading Strategy: Predict

INDEX. classical works 60 sources without pagination 60 sources without date 60 quotation citations 60-61

Dependent Clause (Subordinate Clause) Complex Sentence. Compound Sentence. Coordinating Conjunctions

AO6 Base Therapy Set 1. Sentences and Punctuation. Understanding sentences enables you to understand where to add punctuation.

On the Road to our 1 st Project! The English language started with letters. Letters formed words, and those words are broken into 8 parts of speech.

Evaluating the Elements of a Piece of Practical Writing The author of this friendly letter..

Cambridge Primary English as a Second Language Curriculum Framework mapping to English World

Independent Clause. An independent clause is a group of words that has a subject and a verb that expresses a complete thought and can stand by itself.

Layout. Overall Organisation. Introduction and Conclusion

EDITING STANDARDS TUSCARORA HIGH SCHOOL The following are practical standards which students are expected to meet in all revised writing:

Key stage 2. English grammar, punctuation and spelling. Paper 1: questions national curriculum tests. First name. Middle name.

District of Columbia Standards (Grade 9)

FINAL EXAMINATION Semester 3 / Year 2010

FORMAT GUIDELINES FOR DOCTORAL DISSERTATIONS. Northwestern University The Graduate School

RULES. For Fixing Fragments. Recognize the difference between a sentence and a fragment.

Forty-Four Editing Reminders

Target Vocabulary (Underlining indicates a word or word form from the Academic Word

English Grammar and Punctuation

SAMPLE BOOKLET Published July 2015

TimeLine: Cross-Document Event Ordering SemEval Task 4. Manual Annotation Guidelines

Grammar reference and practice. LOUISE HASHEMI and BARBARA THOMAS

Contents. Section 1 VERBS...57

6 th Grade ELA Post-Test Study Guide Semester One

CRCT Study Guide 6 th Grade Language Arts PARTS OF SPEECH. 1. Noun a word that names a PERSON, PLACE, THING, or IDEA

Information retrieval in folktales using natural language processing

Requirements and editorial norms for work presentations

TEN FOR TEN. 1. Theater audiences in the 1980 s saw more musical comedies than the 1970 s or 1990 s.

225 Prepositions of place

Skill-Builders. Grades 3-4. Grammar & Usage. Writer Kathleen Cribby. Editorial Director Susan A. Blair. Project Manager Erica L.

Connectors and their meaning:

ABSTRACT CITATION HANDLING: PROCESSING CITATION TEXTS IN SCIENTIFIC DOCUMENTS. Michael Alan Whidby Master of Science, 2012

Transcription:

Practice Midterm Exam for Natural Language Processing Name: Net ID Instructions In the actual midterm there will be 7 questions, each will be worth 15 points. You also get 10 point for signing your name on all test materials, seriously, because when students forget to sign their names, I have to somehow figure out whose test a particular piece of paper belongs to. The maximum score on the test will be 115. You will have approximately 1:15 minutes to complete this test. This practice test will have a different number of problems that are intended to be of the same basic type of question as on the actual midterm. THE PRACTICE TEST IS DESIGNED TO TAKE LONGER TO COMPLETE THAN THE ACTUAL TEST WOULD (AROUND 2 HOURS, RATHER THAN 1:15). The test materials will include this printout and one blank test booklet. I suggest that you fill in all answers directly on this printout and use the blank test booklet as scrap paper. However, if you run out of space, you have the option of using the test booklet. If you do this, please include a clear note on the test so I know where to look for your answer. This test is an open book/open notes test: Please feel free to bring your text book, your notes, copies of class lectures and other reading material to the test. A calculator is also permitted and it is OK to look at materials on the web in order to read helpful information, being mindful of the time limit. Just don t use a program that solves a problem for you, e.g., do not find a part of speech tagger and run it if asked to manually annotate mark parts of speech that WOULD be cheating. Answer all questions on the test. If you show your work and you make a simple arithmetic mistake, but it is clear you knew how to do it, you will get partial credit.

William R. Breakey M.D. Pamela J. Fischer M.D. Leighton E. Cluff M.D. James S. Thompson, M.D. C.M. Franklin, M.D. Atul Gawande, M.D. Dr. Talcott Dr. J. Gordon Melton Dr. Etienne-Emile Baulieu Dr. Karl Thomae Dr. Alan D. Lourie Dr. Xiaotong Fei Doctor Dre Doctor Dolittle Doctor William Archibald Spooner Doctor No Figure 1: Correct Instances of Doctors in Our Corpus Question 1. Write a regular expression for identifying names of doctors in text. Your regular expression should match the examples in figure 1, but should not recognize either non-names (words lacking capital letters) or names that do not include the identifying title information (Dr., Doctor, M.D.). Do your best to include information about spaces, hyphens, commas and periods, as per the examples. ((Doctor Dr\.)( [A-Z][a-z\.]+)+) (([A-Z][a-z\.]+ )+M\.D\.)

Tag Description Tag Description CC Coordinating conjunction RB Adverb CD Cardinal number RBR Adverb, comparative DT Determiner RBS Adverb, superlative EX Existential there RP Particle FW Foreign word SYM Symbol IN Preposition or subordinating conjunction TO to JJ Adjective UH Interjection JJR Adjective, comparative VB Verb, base form JJS Adjective, superlative VBD Verb, past tense LS List item marker VBG Verb, gerund or present participle MD Modal VBN Verb, past participle NN Noun, singular or mass VBP Verb, non-3rd person singular present NNS Noun, plural VBZ Verb, 3rd person singular present NNP Proper noun, singular WDT Wh-determiner NNPS Proper noun, plural WP Wh-pronoun PDT Predeterminer WP$ Possessive wh-pronoun POS Possessive ending WRB Wh-adverb PRP Personal pronoun PU Punctuation PRP$ Possessive pronoun Table 1: Penn Treebank POS tags Question 2. Assign Penn parts of speech tags (as per Table 1) to all the words in the following two sentences using the notation word/pos: a. John/NNP and/cc Mary/NNP bought/vbd a/dt refrigerator/nn with/in three/cd doors/nns./pu b. It/PRP was/vbd purchased/vbn from/in a/dt very/rb small/jj store/nn near/in their/prp$ house/nn./pu Question 3. Mark the noun groups in the following sentence using BIO (beginning, intermediate, other) tags. Mary has a room with a view and a bottle of beer B O B I O B I O B I O B

S NP VP NP NP VBD NP NNP CC NNP bought DT NN PP John and Mary a refrigerator IN NP with CD NNS three doors Figure 2: Possible Answer to Question 4 Question 4. Draw a Phrase Structure Tree representing one parse of the following sentence. Make a list of the phrase structure rules that you are assuming. John and Mary bought a refrigerator with three doors. 1. S NP VP 2. NP NP CC NP 3. NP DT NN PP 4. NP CD NNS 5. NP NNP 6. VP VBD NP 7. PP IN NP 8. NNP John 9. NNP Mary 10. NN refrigerator 11. NNS doors 12. CC and 13. VBD bought 14. DT a 15. CD three 16. IN with

Question 5. Calculate precision, recall and f-measure in order to score the following system against the answer key. Assume any item reported by the system and found in the answer key is correct. The system reports that the following strings of words describing attack events: 1. Jay Leno attacked Conan O brien. 2. attacks by the U.S.-backed rebels Correct 3. the latest in a series of attacks in the 10-year-old civil war. Correct 4. Mr. Baldwin is also attacking the greater problem: lack of ringers. 5. the criminals were convicted for bombings. Correct 6. The broadway musical Bridges of Madison County bombed. 7. Groupon fires CEO Andrew Mason. The answer key includes the following strings of words describing attack events: 1. the martians bombarded the Earth with death rays 2. attacks by the U.S.-backed rebels Found by System 3. the latest in a series of attacks in the 10-year-old civil war. Found by System 4. the criminals were convicted for bombings. Found by System 5. the allies launched a missile at the enemy stronghold. Precision = 3/7.429 Recall = 3/5 =.6 2 2 F-measure = 1 = =.5 3/7 +3 7/3+5/3 5

Question 6. Fill in the CKY chart below for sentence The rain rains down assuming the following rules: 1. S NP VP 2. NP N 3. NP DT N 4. VP VADVP 5. VP V 6. ADVP ADV 7. DT the 8. N rain 9. N rains 10. V rain 11. V rains 12. ADV down The rain rains down 1 2 3 4 0 DT NP S S 1 N, V, NP, VP S S 2 N, V, NP, VP VP 3 ADV, ADVP

Question 7. Some defining characteristics of organization and facility as per the ACE guidelines are as follows: An Organization entity must have some formally established association. Typical examples are businesses, government units, sports teams, and formally organized music groups. Industrial sectors and industries are also treated as Organization entities. (ACE Entity Guidelines v6.6, page 7) A facility is a functional, primarily man-made structure. These include buildings and similar facilities designed for human habitation, such as houses, factories, stadiums, office buildings, gymnasiums, prisons, museums, and space stations; objects of similar size designed for storage, such as barns, parking garages and airplane hangars; elements of transportation infrastructure, including streets, highways, airports, ports, train stations, bridges, and tunnels. Roughly speaking, facilities are artifacts falling under the domains of architecture and civil engineering. (ACE Entity Guidelines v6.6, page 22) In the following text from the May 3, 2012 New York Times (A House Tour: Yes, That House) mark the organizations by underlining them and writing an ORG immediately above them; mark the facilities by underlining them and writing FAC immediately above. If a particular piece of text is difficult to mark only ORG or only FAC, mark it ORG/FAC. Mark noun groups ignoring determiners including both names and common nouns representing FAC and ORG constiuents. Do not mark pronouns. After the 9/11 attacks, the system changed radically. Now, anyone who wants to tour the White House/FAC must apply through the office/org of his or her representative in Congress/ORG, which forwards the names to the White House/ORG for clearance... Once they get the green light, visitors show up at the appointed time on 15th Street/FAC between E/FAC and F Streets/FAC and join the line to enter through the southeast gate/fac. Anyone who has flown on an airline/org in recent years will recognize the familiar territory of identity checks and electronic scans, although here you do get to keep your shoes on. At the head of the line, rangers from the National Park Service/ORG check photo IDs against a list of names.

Question 8. Assuming that the following sentence is at the beginning of a file, fill in the table below listing each token (word and punctuation), along with its start character offset and its end character offset. Note that there are more blank lines in the table than there are tokens. So it is expected that you will leave one or more line blank. This sentence contains words, characters, spaces and punctuation. Token Start Offset End Offset This 0 4 sentence 5 13 contains 14 22 words 23 28, 28 29 characters 30 40, 40 41 spaces 42 48 and 49 52 punctuation 53 64. 64 65

1 VBZ.5 PRP.50 VBG 1 NNP Start NNS.33.50.50 End.50 JJ.66 Figure 3: Prior Probability for Question 9 Question 9. Given the training data below, execute the following 3 steps: (a) calculate the likelihood probabilities for each word given each POS; (b) draw a finite state machine where states are POS and edges are labeled with transition probabilities; (c) draw a chart where the columns are positions in the sentence and the rows are names of states (start, end, POS tags) and fill in the probability scores assigned by the Viterbi algorithm assigning POS tags to the string flying planes. Training Data: buffalo/nns flying/vbg is/vbz dangerous/jj flying/jj planes/nns are/vbz numerous/jj I/PRP saw/vbz Mary/NNP flying/vbg planes/nns He/PRP planes/vbz shelves/nns Likelihood for Question 9 JJ dangerous:.33 flying:.33 numerous:.33 NNP Mary: 1 NNS buffalo: planes:.5 shelves: PRP I:.5 he:.5 VBG flying: 1 VBZ is: are: saw: planes:

BEGIN flying planes END BEGIN 1.0 JJ.33 * NNP NNS (from JJ).33 * *.5 *.33 (from VBG) 0 PRP VBG 1 * 0 VBZ (from JJ).33 * * * 0 (from VBG) 0 END (from NNS).33 * *.5 *.33 *.5.0068 Figure 4: Viterbi for Question 9 Question 10. Calculate the TFIDF for the terms listed below for documents 1 to 4. There are 10,000 documents in a collection. The number of times each of these terms occur in documents 1 to 4 as well as the number of documents in the collections are listed below. Use this information to fill in the TFIDF scores in the table below. Number of Documents Containing Terms: reverse cascade: 3 IDF = log(10000/3) 8.11 full shower: 50 IDF = log(10000/50) 5.30 half bath: 10 IDF = log(10000/10) 6.91 multiplex: 3 IDF = log(10000/3) 8.11 Term Frequencies Documents Doc 1 Doc 2 Doc 3 Doc 4 reverse cascade 8 10 0 0 full shower 3 1 2 2 half bath 0 0 8 7 multiplex 2 2 2 9 TFIDF for terms in documents Documents Doc 1 Doc 2 Doc 3 Doc 4 reverse cascade 8.11 * 8 = 64.88 8.11 * 10 = 81.10 0 0 full shower 5.30 * 3 + 15.90 5.30 * 1 = 5.30 5.30 * 2 = 10.60 5.30 * 2 = 10.60 half bath 0 0 6.91 * 8 = 55.28 6.91 * 7 = 48.37 multiplex 8.11 * 2 = 16.22 8.11 * 2 = 16.22 8.11 * 2 = 16.22 8.11 * 9 = 72.99