Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Similar documents
Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS

An Efficient Closed Frequent Itemset Miner for the MOA Stream Mining System

laundry _G3U1W4_ indd 1 2/19/10 4:12 PM

1. The crossed-out phrases have mistakes. Find the mistakes. Write the correct form.

The jar of marmalade

Primary 5 Flying Grammar Primary SB 05.indd :21

Advanced Data Structures and Algorithms

Week 5 Video 4. Relationship Mining Sequential Pattern Mining

Multi-Level Gate Circuits. Chapter 7 Multi-Level Gate Circuits NAND and NOR Gates. Some Terminologies (Cont.) Some Terminologies

Unsymmetrical Aryl(2,4,6-trimethoxyphenyl)iodonium Salts: One-pot Synthesis, Scope, Stability, and Synthetic Applications. Supporting Information

CHAPTER 3. Melody Style Mining

1 Family and friends. 1 Play the game with a partner. Throw a dice. Say. How to play

INSTITUTO NACIONAL 8 TH GRADE UNITS UNIT 6 COUNTABLE AND UNCOUNTABLE NOUNS

Ratios, Rates & Proportions Chapter Questions

The Aristotle Index: Measuring Complexity in the Twenty-First Century. John N. Warfield

Student s Book Listening Script

Come to Live Book 9. Student s Book Listening Script

LEVEL PRE-A1 LAAS LANGUAGE ATTAINMENT ASSESSMENT SYSTEM. English English Language Language Examinations Examinations. December 2005 December 2007

STYLE. Sample Test. School Tests for Young Learners of English. Form A. Level 1

Countable (Can count) uncountable (cannot count)

I ve got. I ve got a cat. I haven t got a dog.

Would Like. I would like a cheeseburger please. I would like to buy this for you. I would like to drink orange juice please.

Aluno(a): Nº. Professor: Série: 7º Disciplina: Inglês. Pré Universitário Uni-Anhanguera. 1)Fill in the gaps with some, any or a - an.

1. Convert the decimal number to binary, octal, and hexadecimal.

TEST NAME:Decimal Review TEST ID: GRADE:05 Fifth Grade SUBJECT: Mathematics TEST CATEGORY: My Classroom

Discovery of frequent episodes in event sequences

CONFECTION RETAILER REWARDS PROGRAM

Grammar: Imperatives Adverbs of sequence Usage: Completing a recipe

From Englishclub.com 1

Section 2: Known and Unknown

The Life Of The Bee By Maurice Maeterlinck

This/These That/Those SINGULAR FORM PLURAL FORM

ThinkNow Media How Streaming Services & Gaming Are Disrupting Traditional Media Consumption Habits Report

GCSE Mathematics Practice Tests: Set 1

Elementary Podcast Support Pack Series 2 episode 9

Replacing GTX 1 (GTX 3000) on Citation 750+

Lesson plan to go with Food Idioms L3, L4 Level 3 teachers may want to use portions of this lesson over several classes.

Elementary Podcast Support Pack Series 2 episode 9

KEY ENGLISH TEST for Schools. Reading and Writing 0082/01 SAMPLE TEST 3. Time. 1 hour 10 minutes

I. Colons A colon usually introduces a list. When used in the text of a sentence...

Teacher s Guide. James Bean with Gillian Flaherty

Gamma instabus. Technical product information

Level 1 - Stage 2 Stage Test based on English in Mind Starters

High Five! 3. 1 Read and write in, on or at. Booster. Name: Class: Prepositions of time Presentation. Practice. Grammar

Read about Charlie Chaplin and match the text with the pictures. Ciar lie Cinaiipiiri - H$ Ufe -

FINAL REVIEW m rounded to the nearest centimeter is _. Choose the correct answer, and write its number in the parentheses.

Voyager Indexes in the I-Share Environment Voyager 9.1.1, June 2015

ENGLISH FILE Beginner

Replacing GRA 5500 on Citation 750+

Shopping 1. Listening and speaking. Reading and writing. What shops can you see here? Where do you go shopping?

beef bread butter cheese chicken fish grapes onions lettuce melon milk rice strawberries tea tomatoes tuna

Practice Test. 2. What is the probability of rolling an even number on a number cube? a. 1 6 b. 2 6 c. 1 2 d. 5 be written as a decimal? 3.

Department of Computer Science and Engineering Question Bank- Even Semester:

MODAL VERBS ABILITY. We can t meet them tomorrow. Can you hear that noise?

Elementary Podcast Series 01 Episode 04

Eindhoven University of Technology MASTER. Connected lighting system data analytics. Zhang, Y. Award date: Link to publication

Q1. In a division sum, the divisor is 4 times the quotient and twice the remainder. If and are respectively the divisor and the dividend, then (a)

CS302 Digital Logic Design Solved Objective Midterm Papers For Preparation of Midterm Exam

Swapping GMA 1 & GMA 2 (GMA 36) on Citation 750+

Background- A2.1 CONTENTS

GUÍA DE ESTUDIO INGLÉS II

Handling Data Quality in Entity Resolution

A2.2 Extra Listening Test 1

Learning fun with.

0:40 LANGUAGE CONVENTIONS. Example test YEAR 3. Use 2B or HB pencil only. Time available for students to complete test: 40 minutes

6th Grade Advanced Topic II Assessment

Teacher-of-English.com

ENGLISH ENGLISH BRITISH. Level 1. Tests

Replacing GEA 3 (GEA 7100) on Citation 750+

A. Write a or an before each of these words. (1 x 1mark = 10 marks) St. Thomas More College Half Yearly Examinations February 2009

Lesson 10. Here are the first two worked out.

Reading Strategy: 03 Scanning

Grammar. Name: 1 Underline the correct words.

VENDOR NUMBER CROSS REFERENCE LIST

1.b. Realize a 5-input NOR function using 2-input NOR gates only.

Quiz #4 Thursday, April 25, 2002, 5:30-6:45 PM

New Inside Out Beginner Units Tests

Date Vendor name Vendor ID Website Range name Product name. Application Software Version: Firmware Revision: BACnet Protocol Revision:

Modbus Register Tables for SITRANS RD300 & WI100

Where are the three friends?... What is the girl wearing?... Find the true sentence...

SUPPLEMENTARY MATERIAL (New Language Leader Elementary Unit 6) B / C LEVEL TEACHERS COPY

EECS 270 Homework the Last Winter 2017

Introducing your students to spoken grammar

Unit 3: Reading and Understanding in Arabic

Evolutionary Music Composition for Digital Games Using Regent-Dependent Creativity Metric

Surface Mount LED Indicator Agilent HSMx-A2xx-xxxxx Bi-Color, HSMx-A3xx-xxxxx Tri Color PLCC-4 SMT LED

Applications of Mathematics

Cinema - Years 5/6. Teaching ideas - page 1

Table of Contents. Relatives. Birthday Party. Unit 1

English Rapid Tests. Punctuation match. Dont do that! What is your name. The weather is fine today? We need bread milk and tea from the shop.

Technical Note

Developing EFL Learners Pragmatic Competence

SALE TODAY All toys half price

Department of CSIT. Class: B.SC Semester: II Year: 2013 Paper Title: Introduction to logics of Computer Max Marks: 30

to believe all evening thing to see to switch on together possibly possibility around

LED SUPERSTAR CLASSIC A advanced

Reading and Writing Part 1 4. Reading and Writing Part 2 8. Reading and Writing Part Reading and Writing Part 4 17

Welcome to Your University Store!

Integrating Asynchronous Paradigms into a VLSI Design Course

Elementary Podcast Support Pack Series 2 episode 10

Transcription:

Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 2016 2017

Road map Association rule mining Market-Basket Data Frequent Itemsets Association rule Applications Association Rules Definition Measure 1: Support Measure 2: Confidence Transaction data: supermarket data Rule strength measures Department of CS - DM - UHD 2

Association rule mining Proposed by Agrawal et al in 1993. It is an important data mining model studied extensively by the database and data mining community. Initially used for Market Basket Analysis to find how items purchased by customers are related. Department of CS - DM - UHD 3

Market-Basket Data A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on one day. basket Department of CS - DM - UHD 4

Market Basket Analysis Department of CS - DM - UHD 5

Department of CS - DM - UHD Frequent Itemsets Given a set of transactions, find combinations of items (itemsets) that occur frequently Market-Basket transactions Items: {Bread, Milk, Diaper, Beer, Eggs, Coke} TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke {Bread}: 4 {Milk} : 4 {Diaper} : 4 {Beer}: 3 {Diaper, Beer} : 3 {Milk, Bread} : 3 6

Association rule Applications Items = products; baskets = sets of products someone bought in one trip to the store. Example application: given that many people buy tea and sugar together: Run a sale on sugar ; raise price of tea. Only useful if many buy sugar & tea. Department of CS - DM - UHD 7

Association Rules Definition Association rules are if/then statements that help uncover relationships between seemingly unrelated data in a relational database or other information repository. An example of an association rule would be "If a customer buys a dozen eggs, he is 80% likely to also purchase milk." There are two common ways to measure association. Department of CS - DM - UHD 8

Measure 1: Support. Measure 1: Support. This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears. In Table 1 below, the support of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For instance, the support of {apple, beer, rice} is 2 out of 8, or 25%. Department of CS - DM - UHD 9

Measure 1: Support. Table 1. Example Transactions If you discover that sales of items beyond a certain proportion tend to have a significant impact on your profits, you might consider using that proportion as your support threshold. You may then identify itemsets with support values above this threshold as significant itemsets. Department of CS - DM - UHD 10

Measure 2: Confidence. Measure 2: Confidence. This says how likely item Y is purchased when item X is purchased, expressed as {X Y}. This is measured by the proportion of transactions with item X, in which item Y also appears. In Table 1, the confidence of {apple beer} is 3 out of 4, or 75%. 3 / 8 = 0.375 4 / 8 = 0.5 Confidence = 0.375 / 0.5 = 0.75 Department of CS - DM - UHD 11

Support and Confidence Example Transaction ID Items Bought 1 Shoes, Shirt, Jacket 2 Shoes,Jacket 3 Shoes, Jeans 4 Shirt, Sweatshirt If the support is 50%, then {Shoes, Jacket} is the only 2- itemset that satisfies the support. Frequent Itemset Support {Shoes} 75% {Shirt} 50% {Jacket} 50% {Shoes, Jacket} 50% If the confidence is 50%, then the only two rules generated from this 2-itemset, that have confidence are: Shoes Jacket Support=50%, Confidence=66% Jacket Shoes Support=50%, Confidence=100% 12

Support and Confidence Example Given a database of transactions: Find all the association rules: Department of CS - DM - UHD 13

The model: data I = {i 1, i 2,, i m }: a set of items. Transaction t : t a set of items, and t I. Transaction Database T: a set of transactions T = {t 1, t 2,, t n }. Department of CS - DM - UHD 14

Transaction data: supermarket data Market basket transactions: t1: {bread, cheese, milk} t2: {apple, eggs, salt, yogurt} tn: {biscuit, eggs, milk} Concepts: An item: an item/article in a basket I: the set of all items sold in the store A transaction: items purchased in a basket; it may have TID (transaction ID) A transactional dataset: A set of transactions Department of CS - DM - UHD 15

Transaction data: a set of documents A text document data set. Each document is treated as a bag of keywords doc1: doc2: doc3: doc4: doc5: doc6: doc7: Student, Teach, School Student, School Teach, School, City, Game Baseball, Basketball Basketball, Player, Spectator Baseball, Coach, Game, Team Basketball, Team, City, Game Department of CS - DM - UHD 16

Transaction data representation A simplistic view of shopping baskets, Some important information not considered. E.g, the quantity of each item purchased and the price paid. Department of CS - DM - UHD 17

Mining Frequent Itemsets task Input: A set of transactions T, over a set of items I Output: All possible itemsets Problem parameters: N = T : number of transactions d = I : number of (distinct) items w: max width of a transaction M: Number of possible itemsets M = 2 d? Department of CS - DM - UHD 18

Frequent Itemset Generation Network null A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, there are 2 d possible itemsets Department of CS - DM - UHD 19

Frequent Itemset Generation Network Given d items, there are 2 d possible itemsets Department of CS - DM - UHD 20

A Binary Data Matrix of a Transactions Database Department of CS - DM - UHD 21

Department of CS - DM - UHD 22