Machine Learning - Deep Learning

VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM

VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM

• The key idea in the CANDIDATE-ELIMINATION algorithm is to output

• a description of the set of all hypotheses consistent with the training examples

Note difference between definitions of consistent and satisfies

· An example x is said to satisfy hypothesis h when h(x) = 1, regardless of whether x is a positive or negative example of the target concept.
· An example x is said to consistent with hypothesis h iff h(x) = c(x)

The LIST-THEN-ELIMINATION algorithm

The LIST-THEN-ELIMINATE algorithm first initializes the version space to contain all hypotheses in H and then eliminates any hypothesis found inconsistent with any training example.

___________________________________________________________________________

1. VersionSpace c a list containing every hypothesis in H

2. For each training example, (x, c(x))

remove from VersionSpace any hypothesis h for which h(x ≠ c(x)

3. Output the list of hypotheses in VersionSpace

_________________________________________________________________________

The LIST-THEN-ELIMINATE Algorithm

· List-Then-Eliminate works in principle, so long as version space is finite.

· However, since it requires exhaustive enumeration of all hypotheses in practice it is not feasible

Consistent Hypothesis and Version Space

• F1 - > A, B

• F2 - > X, Y

• Instance Space: (A, X), (A, Y), (B, X), (B, Y) – 4 Examples

• Hypothesis Space: (A, X), (A, Y), (A, ø), (A, ?), (B, X), (B, Y), (B, ø), (B, ?), (ø, X), (ø, Y), (ø, ø), (ø, ?), (?, X), (?, Y), (?, ø), (?, ?) - 16 Hypothesis

• Semantically Distinct Hypothesis: (A, X), (A, Y), (A, ?), (B, X), (B, Y), (B, ?), (?, X), (?, Y ), (?, ?), (ø, ø) – 10

Version Space: (A, X), (A, Y), (A, ?), (B, X), (B, Y), (B, ?), (?, X), (?, Y) (?, ?), (ø, ø),

•Training Instances

F1 F2 Target

A X Yes

A Y Yes

•Consistent Hypothesis are: (A, ?), (?, ?)

List-Then-Eliminate algorithm

• Problems

ü The hypothesis space must be finite

ü Enumeration of all the hypothesis, rather inefficient

A More Compact Representation for Version Spaces

• The version space is represented by its most general and least general members.
• These members form general and specific boundary sets that delimit the version space within the partially ordered hypothesis space.

· A version space with its general and specific boundary sets.

· The version space includes all six hypotheses shown here, but can be represented more simply by S and G.

· Arrows indicate instances of the more_general_than relation.

· This is the version space for the EnjoySport concept learning problem and training example.

CANDIDATE-ELIMINATION Learning Algorithm

The CANDIDATE-ELIMINTION algorithm computes the version space containing all

hypotheses from H that are consistent with an observed sequence of training examples.

___________________________________________________________________________

Initialize G to the set of maximally general hypotheses in H

Initialize S to the set of maximally specific hypotheses in H

For each training example d, do

• If d is a positive example

• Remove from G any hypothesis inconsistent with d

• For each hypothesis s in S that is not consistent with d

• Remove s from S

• Add to S all minimal generalizations h of s such that

• h is consistent with d, and some member of G is more general than h

• Remove from S any hypothesis that is more general than another hypothesis in S

• If d is a negative example

• Remove from S any hypothesis inconsistent with d

• For each hypothesis g in G that is not consistent with d

• Remove g from G

• Add to G all minimal specializations h of g such that

• h is consistent with d, and some member of S is more specific than h

• Remove from G any hypothesis that is less general than another hypothesis in G

___________________________________________________________________________

CANDIDATE- ELIMINTION algorithm using version spaces

An Illustrative Example

CANDIDATE-ELIMINTION algorithm begins by initializing the version space to the set of all hypotheses in H;

• Initializing the G boundary set to contain the most general hypothesis in H

• Initializing the S boundary set to contain the most specific (least general) hypothesis

ü When the first training example is presented, the CANDIDATE-ELIMINTION algorithm checks the S boundary and finds that it is overly specific and it fails to cover the positive example.

ü The boundary is therefore revised by moving it to the least more general hypothesis that covers this new example

ü No update of the G boundary is needed in response to this training example because Go correctly covers this example.

When the second training example is observed, it has a similar effect of generalizing S further to S2, leaving G again unchanged i.e., G2 = G1 =G0

· Consider the third training example. This negative example reveals that the G boundary of the version space is overly general, that is, the hypothesis in G incorrectly predicts that this new example is a positive example.

· The hypothesis in the G boundary must therefore be specialized until it correctly classifies this new negative example

Given that there are six attributes that could be specified to specialize G2, why are there only three new hypotheses in G3?

· For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization of G2 that correctly labels the new example as a negative example, but it is not included in G3.
· The reason this hypothesis is excluded is that it is inconsistent with the previously encountered positive examples

Consider the fourth training example:

• This positive example further generalizes the S boundary of the version space. It also results in removing one member of the G boundary, because this member fails to cover the new positive example

• After processing these four examples, the boundary sets S4 and G4 delimit the version space of all hypotheses consistent with the set of incrementally observed training examples.

FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS

FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS

• To illustrate this algorithm, assume the learner is given the sequence of training examples from the EnjoySport task.

Step 1: Initialize h to the most specific hypothesis in H

FIND-S: Step-2

· The first step of FIND-S is to initialize h to the most specific hypothesis in H

h - (Ø, Ø, Ø, Ø, Ø, Ø)

· Consider the first training example

x1 = <Sunny, Warm, Normal, Strong, Warm, Same>, +

• Observing the first training example, it is clear that hypothesis h is too specific. None of the "Ø" constraints in h are satisfied by this example, so each is replaced by the next more general constraint that fits the example

h1 = <Sunny, Warm, Normal, Strong, Warm, Same>

· Consider the second training example

x2 = <Sunny, Warm, High, Strong, Warm, Same>, +

• The second training example forces the algorithm to further generalize h, this time substituting a "?" in place of any attribute value in h that is not satisfied by the new example

h2 = <Sunny, Warm, ?, Strong, Warm, Same>

· Consider the third training example

x3 = <Rainy, Cold, High, Strong, Warm, Change>, -

• Upon encountering the third training the algorithm makes no change to h. The FIND-S algorithm simply ignores every negative example.

h3 = < Sunny Warm ? Strong Warm Same>

· Consider the fourth training example

x4 = <Sunny Warm High Strong Cool Change>, +

• The fourth example leads to a further generalization of h

h4 = < Sunny Warm ? Strong ? ? >

The key property of the FIND-S algorithm

· FIND-S is guaranteed to output the most specific hypothesis within H that is consistent with the positive training examples

· FIND-S algorithm’s final hypothesis will also be consistent with the negative examples provided the correct target concept is contained in H, and provided the training examples are correct.

CONCEPT LEARNING AS SEARCH

CONCEPT LEARNING AS SEARCH

· Concept learning can be viewed as

· the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation.

· The goal of this search is to

· find the hypothesis that best fits the training examples.

Example:

• Consider the instances X and hypotheses H in the EnjoySport learning task.

• The attribute

• Sky has three possible values, and

• AirTemp, Humidity, Wind, Water, Forecast each have two possible values,

• the instance space X contains

• exactly 3*2*2*2*2*2 = 96 distinct instances

• 5*4*4*4*4*4 = 5120 syntactically distinct hypotheses within H.

• Every hypothesis containing one or more "Φ" symbols represents the empty set of instances; that is, it classifies every instance as negative.

• 1 + (4*3*3*3*3*3) = 973 Semantically distinct hypotheses.

A CONCEPT LEARNING TASK – Instance Space

Hypothesis Space

ü Similarly there are 5 * 4 * 4 * 4 * 4 * 4 = 5120 syntactically distinct hypotheses within H.

ü Notice, however, that every hypothesis containing one or more "ø" symbols represents the empty set of instances; that is, it classifies every instance as negative.

ü Therefore, the number of semantically distinct hypotheses is only 1 + (4 *3 * 3 * 3 * 3 * 3) = 973.

ü Our EnjoySport example is a very simple learning task, with a relatively small, finite hypothesis space.

General-to-Specific Ordering of Hypotheses

• To illustrate the general-to-specific ordering, consider the two hypotheses

h1 = (Sunny, ?, ?, Strong, ?, ?)

h2 = (Sunny, ?, ?, ?, ?, ?)

• Now consider the sets of instances that are classified positive by h1 and by h2. Because h2 imposes fewer constraints on the instance, it classifies more instances as positive.

• In fact, any instance classified positive by h1 will also be classified positive by h2. Therefore, we say that h2 is more general than h1.

More General Than hypothesis

· In the figure,

ü the box on the left represents the set X of all instances,

ü the box on the right the set H of all hypotheses.

· Each hypothesis corresponds to some subset of X

ü – the subset of instances that it classifies positive.

· The arrows connecting hypotheses represent

ü the more - general -than relation,

ü with the arrow pointing toward the less general hypothesis.

· Note the subset of instances characterized by

ü h2 subsumes the subset characterized by h1,

ü hence h2 is more - general– than h1.

Machine Learning - Deep Learning

VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM

FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS

CONCEPT LEARNING AS SEARCH

About Machine Learning

SOFTWARE ENGINEERING