Introduction to Knowledge Systems 1
Knowledge Systems Knowledge systems aim at achieving intelligent behavior through computational means 2
Knowledge Systems Knowledge is usually represented as a kind of structural descriptions. Structural descriptions represent patterns computational model capturing problem-solving knowledge can be used to make decision / predict outcome in new situations 3
Structural Descriptions Consider the contact lens problem The aim is to recommend a suitable type of contact lens for a customer Example: if-then rules If tear production rate = reduced then recommendation = none Otherwise, if age = young and astigmatic = no then recommendation = soft One issue is how to acquire knowledge 4
Knowledge Acquisition Acquiring knowledge through past experience Computational process for improving performance based on experience Data can capture past experience Machine Learning algorithms Acquiring knowledge / structural descriptions from data / examples Related to artificial intelligence (AI) and statistics 5
Machine Learning Process Traditional Programs Data Model / Algorithm Computer Output Machine Learning Data Output Computer Model Capturing Knowledge Closely related to data mining and big data 6
Knowledge vs. Data Society produces huge amounts of data sources: business, science, medicine, economics, geography, environment, sports, Potentially valuable resource Raw data is useless: need techniques to automatically extract knowledge from it data: recorded facts knowledge: useful patterns underlying the data 7
Contact Lens Problem Sample rule capturing the relationship between contact lens and customer attributes If tear production rate = reduced then recommendation = none Otherwise, if age = young and astigmatic = no then recommendation = soft Age Spectacle prescription Astigmatism Tear production rate Recommended lenses Young Young rmal Soft Pre-presbyopic Presbyopic rmal Hard 8
The contact lens data Pre-presbyopic rmal Pre-presbyopic Presbyopic rmal Presbyopic Presbyopic Hard rmal Presbyopic Presbyopic Soft rmal Presbyopic Presbyopic rmal Presbyopic Soft rmal Pre-presbyopic Pre-presbyopic Hard rmal Pre-presbyopic Pre-presbyopic Soft rmal Pre-presbyopic Pre-presbyopic hard rmal Young Young Soft rmal Young Young Hard rmal Young Young Soft rmal Young Young Recommended lenses Tear production rate Astigmatism Spectacle prescription Age 9
A complete and correct rule set If tear production rate = reduced then recommendation = none If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age = presbyopic and spectacle prescription = myope and astigmatic = no then recommendation = none If spectacle prescription = hypermetrope and astigmatic = no and tear production rate = normal then recommendation = soft If spectacle prescription = myope and astigmatic = yes and tear production rate = normal then recommendation = hard If age young and astigmatic = yes and tear production rate = normal then recommendation = hard If age = pre-presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none If age = presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none 10
A decision tree for this problem 11
Different Applications The result of learning, or the learning method itself, is deployed in practical applications: Processing loan applications Image object detection Diagnosis of machine faults Marketing and sales Fraud detection Scientific applications: biology, astronomy, chemistry Automatic selection of TV programs Monitoring intensive care patients 12
Face Detection Image patch classification model classify faces and non-faces Data composed of 1000 samples of labeled images of faces and non-faces Applied to a dense set of overlapping patches in the test image 13
Marketing and sales Companies precisely record massive amounts of marketing and sales data Applications: Customer loyalty: identifying customers that are likely to defect by detecting changes in their behavior (e.g. banks/phone companies) Special offers: identifying profitable customers (e.g. reliable owners of credit cards that need extra money during the holiday season) 14
Fraud Detection The financial company PayPal handles massive payment volumes Generates more than $11,000 in payments every second Handles more than 5 billion payments in a year 15
Fraud Detection Detect suspicious activity, separate false alarms and true fraud Employ machine learning and deep learning The AI machine have freed human detectives up to identify new types of fraud patterns, which they can then inform the AI machine about 16
Processing loan applications (American Express) Given: questionnaire with financial and personal information Question: should money be lent? Simple statistical method covers 90% of cases Borderline cases referred to loan officers But: 50% of accepted borderline cases defaulted! Solution: reject all borderline cases?! Borderline cases are most active customers 17
Applying Machine Learning 1000 training examples of borderline cases 20 attributes: age years with current employer years at current address years with the bank other credit cards possessed, Learned rules: correct on 70% of cases human experts only 50% Rules could be used to explain decisions to customers 18
Diagnosis of machine faults Machine fault diagnosis Given: Fourier analysis of vibrations measured at various points of a device s mounting Question: which fault is present? Preventative maintenance of electromechanical motors and generators Information very noisy So far: diagnosis by expert/hand-crafted rules 19
Applying Machine Learning Available: 600 faults with expert s diagnosis ~300 unsatisfactory, rest used for training Attributes augmented by intermediate concepts that embodied causal domain knowledge Expert not satisfied with initial rules because they did not relate to his domain knowledge Further background knowledge resulted in more complex rules that were satisfactory Learned rules outperformed hand-crafted ones 20
Document Classification 20 newsgroup Data 21
Machine learning and Statistics Historical difference (grossly oversimplified): statistics: testing hypotheses machine learning: finding the right hypothesis But: huge overlap decision trees (C4.5 and CART) nearest-neighbor methods Today: perspectives have converged Most machine learning algorithms involve statistical techniques 22