Introduction to Artificial Intelligence. Learning from Oberservations

Similar documents
Introduction to Artificial Intelligence. Learning from Oberservations

Introduction to Artificial Intelligence. Problem Solving and Search

Introduction to Artificial Intelligence. Problem Solving and Search

Machine Learning: finding patterns

Introduction to Artificial Intelligence. Planning

Jin-Fu Li Advanced Reliable Systems (ARES) Laboratory. National Central University

Melody classification using patterns

VBM683 Machine Learning

22/9/2013. Acknowledgement. Outline of the Lecture. What is an Agent? EH2750 Computer Applications in Power Systems, Advanced Course. output.

CPSC 121: Models of Computation. Module 1: Propositional Logic

Informatique Fondamentale IMA S8

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Jazz Melody Generation from Recurrent Network Learning of Several Human Melodies

A Transformational Grammar Framework for Improvisation

CS 61C: Great Ideas in Computer Architecture

VLSI System Testing. BIST Motivation

Retiming Sequential Circuits for Low Power

WATSON BEAT: COMPOSING MUSIC USING FORESIGHT AND PLANNING

The reduction in the number of flip-flops in a sequential circuit is referred to as the state-reduction problem.

Library Assignment #2: Periodical Literature

MC9211 Computer Organization

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Chapter 12. Synchronous Circuits. Contents

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

Understanding Cryptography A Textbook for Students and Practitioners by Christof Paar and Jan Pelzl. Chapter 2 Stream Ciphers ver.

Chapter 5 Synchronous Sequential Logic

The Bias-Variance Tradeoff

Validity. What Is It? Types We Will Discuss. The degree to which an inference from a test score is appropriate or meaningful.

CHAPTER 4: Logic Circuits

CHAPTER 4: Logic Circuits

The basic logic gates are the inverter (or NOT gate), the AND gate, the OR gate and the exclusive-or gate (XOR). If you put an inverter in front of

Notes on Digital Circuits

A discretization algorithm based on Class-Attribute Contingency Coefficient

Chapter 5: Synchronous Sequential Logic

Introduction to Knowledge Systems

Encoders and Decoders: Details and Design Issues

Synthesis Techniques for Pseudo-Random Built-In Self-Test Based on the LFSR

cs281: Introduction to Computer Systems Lab07 - Sequential Circuits II: Ant Brain

Logic. Andrew Mark Allen March 4, 2012

Muscle Sensor KI 2 Instructions

Peirce's Remarkable Rules of Inference

Design Project: Designing a Viterbi Decoder (PART I)

DEDICATED TO EMBEDDED SOLUTIONS

Efficient Label Encoding for Range-based Dynamic XML Labeling Schemes

(Skip to step 11 if you are already familiar with connecting to the Tribot)

Composer Style Attribution

Example: compressing black and white images 2 Say we are trying to compress an image of black and white pixels: CSC310 Information Theory.

Introduction to Natural Language Processing This week & next week: Classification Sentiment Lexicons

A Review of logic design

Advanced Digital Logic Design EECS 303

Chapter 23 Dimmer monitoring

Quantitative Evaluation of Pairs and RS Steganalysis

Modeling memory for melodies

Where Are We Now? e.g., ADD $S0 $S1 $S2?? Computed by digital circuit. CSCI 402: Computer Architectures. Some basics of Logic Design (Appendix B)

A Model of Musical Motifs

A Model of Musical Motifs

Digital Electronics Course Outline

Universität Bamberg Angewandte Informatik. Seminar KI: gestern, heute, morgen. We are Humor Beings. Understanding and Predicting visual Humor

Testing Digital Systems II

Playing Mozart by Analogy: Learning Multi-level Timing and Dynamics Strategies

Computer Coordination With Popular Music: A New Research Agenda 1

VLSI Test Technology and Reliability (ET4076)

C/ Fernando Poo 5 Madrid (Metro Delicias o Embajadores).

Designing a Deductive Foundation System

Relational IBL in classical music

Put your sound where it belongs: Numerical optimization of sound systems. Stefan Feistel, Bruce C. Olson, Ana M. Jaramillo AFMG Technologies GmbH

Data Mining Part 1. Tony C. Smith WEKA Machine Learning Group Department of Computer Science University of Waikato

JASON FREEMAN THE LOCUST TREE IN FLOWER AN INTERACTIVE, MULTIMEDIA INSTALLATION BASED ON A TEXT BY WILLIAM CARLOS WILLIAMS

Service and Technical Support PLEASE CONTACT YOUR NEAREST DISTRIBUTOR If unknown then fax: 44 (0)

CHAPTER-9 DEVELOPMENT OF MODEL USING ANFIS

Automatic Laughter Detection

User Interface Design: Simplicity & Elegance

Data Representation. signals can vary continuously across an infinite range of values e.g., frequencies on an old-fashioned radio with a dial

Dissertation proposals should contain at least three major sections. These are:

APPLICATION NOTE AN-B03. Aug 30, Bobcat CAMERA SERIES CREATING LOOK-UP-TABLES

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

CIS 500 Software Foundations Fall Reasoning about evaluation. More on induction. Induction principles. More induction principles

A Novel Dynamic Method to Generate PRBS Pattern

Setting Energy Efficiency Requirements Using Multivariate Regression

MITOCW ocw f08-lec19_300k

NH 67, Karur Trichy Highways, Puliyur C.F, Karur District UNIT-III SEQUENTIAL CIRCUITS

The word digital implies information in computers is represented by variables that take a limited number of discrete values.

On-line Multi-label Classification

HIT SONG SCIENCE IS NOT YET A SCIENCE

Advancing in Debate: Skills & Concepts

COMP Test on Psychology 320 Check on Mastery of Prerequisites

Jam Tomorrow: Collaborative Music Generation in Croquet Using OpenAL

Fault Analysis of Stream Ciphers

Chapter 3. Boolean Algebra and Digital Logic

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

Music Performance Panel: NICI / MMM Position Statement

CPS311 Lecture: Sequential Circuits

A Level. How to set a question. Unit F663 - Drama and Poetry pre

Louis-Philippe Morency Institute for Creative Technologies University of Southern California Fiji Way, Marina Del Rey, CA, USA

EE292: Fundamentals of ECE

Music Genre Classification and Variance Comparison on Number of Genres

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Quick Report on Silicon G-APDs (a.k.a. Si-PM) studies. XIV SuperB General Meeting LNF - Frascati

TEST PATTERNS COMPRESSION TECHNIQUES BASED ON SAT SOLVING FOR SCAN-BASED DIGITAL CIRCUITS

An Introduction To Scientific Research E Bright Wilson

Transcription:

Introduction to Artificial Intelligence Learning from Oberservations Bernhard Beckert UNIVERSITÄT KOBLENZ-LANDAU Wintersemester 2003/2004 B. Beckert: Einführung in die KI / KI für IM p.1

Outline Learning agents Inductive learning Decision tree learning B. Beckert: Einführung in die KI / KI für IM p.2

Learning Reasons for learning Learning is essential for unknown environments, when designer lacks omniscience B. Beckert: Einführung in die KI / KI für IM p.3

Learning Reasons for learning Learning is essential for unknown environments, when designer lacks omniscience Learning is useful as a system construction method, expose the agent to reality rather than trying to write it down B. Beckert: Einführung in die KI / KI für IM p.3

Learning Reasons for learning Learning is essential for unknown environments, when designer lacks omniscience Learning is useful as a system construction method, expose the agent to reality rather than trying to write it down Learning modifies the agent s decision mechanisms to improve performance B. Beckert: Einführung in die KI / KI für IM p.3

Learning Agents Performance standard Critic Sensors feedback Agent learning goals Learning element Problem generator changes knowledge experiments Performance element Effectors Environment B. Beckert: Einführung in die KI / KI für IM p.4

Learning Element Design of learning element is dictated by what type of performance element is used which functional component is to be learned how that functional component is represented what kind of feedback is available B. Beckert: Einführung in die KI / KI für IM p.5

Types of Learning Supervised learning Correct answers for each example instance known Requires teacher B. Beckert: Einführung in die KI / KI für IM p.6

Types of Learning Supervised learning Correct answers for each example instance known Requires teacher Reinforcement learning Occasional rewards Learning is harder Requires no teacher B. Beckert: Einführung in die KI / KI für IM p.6

Inductive Learning (a.k.a. Science) Simplest form Learn a function f from examples (tabula rasa), i.e., find an hypothesis h such that h f given a training set of examples f is the target function An example is a pair x, f (x) B. Beckert: Einführung in die KI / KI für IM p.7

Inductive Learning (a.k.a. Science) Simplest form Learn a function f from examples (tabula rasa), i.e., find an hypothesis h such that h f given a training set of examples f is the target function An example is a pair x, f (x) Example (for an example) O O X X, +1 X B. Beckert: Einführung in die KI / KI für IM p.7

Inductive Learning Method This is a highly simplified model of real learning Ignores prior knowledge Assumes a deterministic, observable environment Assumes examples are given Assumes that the agent wants to learn f (why?) B. Beckert: Einführung in die KI / KI für IM p.8

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting B. Beckert: Einführung in die KI / KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting f(x) x B. Beckert: Einführung in die KI / KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting f(x) x B. Beckert: Einführung in die KI / KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting f(x) x B. Beckert: Einführung in die KI / KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting f(x) x B. Beckert: Einführung in die KI / KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting f(x) x B. Beckert: Einführung in die KI / KI für IM p.9

Inductive Learning Method Idea Construct/adjust h to agree with f on training set h is consistent if it agrees with f on all examples Example: Curve fitting f(x) Ockham s razor Maximize a combination of consistency and simplicity x B. Beckert: Einführung in die KI / KI für IM p.9

Attribute-based Representations Example description consists of Attribute values (boolean, discrete, continuous, etc.) Target value B. Beckert: Einführung in die KI / KI für IM p.10

Attribute-based Representations Example Situations where I will/won t wait for a table in a restaurant Exmpl. Attributes Target Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait X 1 T F F T Some $$$ F T French 0 10 T X 2 T F F T Full $ F F Thai 30 60 F X 3 F T F F Some $ F F Burger 0 10 T X 4 T F T T Full $ F F Thai 10 30 T X 5 T F T F Full $$$ F T French >60 F X 6 F T F T Some $$ T T Italian 0 10 T X 7 F T F F None $ T F Burger 0 10 F X 8 F F F T Some $$ T T Thai 0 10 T X 9 F T T F Full $ T F Burger >60 F X 10 T T T T Full $$$ F T Italian 10 30 F X 11 F F F F None $ F F Thai 0 10 F X 12 T T T T Full $ F F Burger 30 60 T B. Beckert: Einführung in die KI / KI für IM p.11

Decision Trees A possible representation for hypotheses Example The correct tree for deciding whether to wait Patrons? None Some Full F T WaitEstimate? >60 30 60 10 30 0 10 F Alternate? Hungry? T No Yes No Yes Reservation? Fri/Sat? T Alternate? No Yes No Yes No Yes Bar? T F T T Raining? No Yes No Yes F T F T B. Beckert: Einführung in die KI / KI für IM p.12

Decision Trees Properties Decision trees can approximate any function of the input attributes ( correct decision tree may be infinite) B. Beckert: Einführung in die KI / KI für IM p.13

Decision Trees Properties Decision trees can approximate any function of the input attributes ( correct decision tree may be infinite) Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic) B. Beckert: Einführung in die KI / KI für IM p.13

Decision Trees Properties Decision trees can approximate any function of the input attributes ( correct decision tree may be infinite) Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic) Decision tree for training examples probably won t generalize to new examples B. Beckert: Einführung in die KI / KI für IM p.13

Decision Trees Properties Decision trees can approximate any function of the input attributes ( correct decision tree may be infinite) Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic) Decision tree for training examples probably won t generalize to new examples Compact decision trees are preferable B. Beckert: Einführung in die KI / KI für IM p.13

Decision Trees Properties Decision trees can approximate any function of the input attributes ( correct decision tree may be infinite) Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic) Decision tree for training examples probably won t generalize to new examples Compact decision trees are preferable More expressive hypothesis space increases chance that target function can be expressed increases number of hypotheses consistent with training set may get worse predictions B. Beckert: Einführung in die KI / KI für IM p.13

Decision Trees Example For Boolean functions: truth-table row = path to leaf in decision tree A B A xor B F F F F T T T F T T T F F F B A F T B T F T T T F B. Beckert: Einführung in die KI / KI für IM p.14

Hypothesis Spaces How many distinct decision trees with n Boolean attributes? B. Beckert: Einführung in die KI / KI für IM p.15

Hypothesis Spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions B. Beckert: Einführung in die KI / KI für IM p.15

Hypothesis Spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions = number of distinct truth tables with 2 n rows B. Beckert: Einführung in die KI / KI für IM p.15

Hypothesis Spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n B. Beckert: Einführung in die KI / KI für IM p.15

Hypothesis Spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions = number of distinct truth tables with 2 n rows = 2 2n Example With 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees B. Beckert: Einführung in die KI / KI für IM p.15

Decision Tree Learning Aim Find a small tree consistent with the training examples Idea (Recursively) choose most significant attribute as root of (sub)tree B. Beckert: Einführung in die KI / KI für IM p.16

Choosing an Attribute Idea A good attribute splits the examples into subsets that are (ideally) all positive or all negative, i.e., gives much information about the classification B. Beckert: Einführung in die KI / KI für IM p.17

Choosing an Attribute Idea A good attribute splits the examples into subsets that are (ideally) all positive or all negative, i.e., gives much information about the classification Example Patrons? Type? None Some Full French Italian Thai Burger Patrons is a better choice B. Beckert: Einführung in die KI / KI für IM p.17

Decision Tree Learning: Algorithm function DTL(examples,attributes,default) returns a decision tree if examples is empty then return default else if all examples have the same classification then return the classification else if attributes is empty then return MAJORITY-VALUE(examples) else best CHOOSE-ATTRIBUTE(attributes,examples) tree a new decision tree with root test best m MAJORITY-VALUE(examples) for each value v i of best do examples i {elements of examples with best = v i } subtree,dtl(examples i,attributes best,m) add a branch to tree with label v i and subtree subtree return tree B. Beckert: Einführung in die KI / KI für IM p.18

Example Decision tree learned from the 12 examples Patrons? None Some Full F T Hungry? Yes No Type? F French Italian Thai Burger T F Fri/Sat? T No Yes F T Substantially simpler than true tree A more complex hypothesis isn t justified by small amount of data B. Beckert: Einführung in die KI / KI für IM p.19

Performance Measurement Hume s Problem of Induction How do we know that h f? Use theorems of computational/statistical learning theory Try h on a new test set of examples (use same distribution over example space as training set) B. Beckert: Einführung in die KI / KI für IM p.20

Performance Measurement Learning curve % correct on test set as a function of training set size 1 0.9 % correct on test set 0.8 0.7 0.6 0.5 0.4 0 20 40 60 80 100 Training set size B. Beckert: Einführung in die KI / KI für IM p.21

Performance Measurement (cont.) Learning curve depends on realizable (can express target function) vs. non-realizable Non-realizability can be due to missing attributes, or restricted hypothesis class (e.g., thresholded linear function) redundant expressiveness (e.g., loads of irrelevant attributes) % correct 1 realizable redundant nonrealizable # of examples B. Beckert: Einführung in die KI / KI für IM p.22

Summary Learning needed for unknown environments, lazy designers Learning agent = performance element + learning element Learning method depends on type of performance element, available feedback, type of component to be improved For supervised learning, the aim is to find a simple hypothesis approximately consistent with training examples Decision tree learning using information gain Learning performance = prediction accuracy measured on test set B. Beckert: Einführung in die KI / KI für IM p.23