Machine Learning - Deep Learning

3. PERSPECTIVES AND ISSUES IN MACHINE LEARNING

3. PERSPECTIVES AND ISSUES IN MACHINE LEARNING

Issues in Machine Learning

· What algorithms exist for learning general target functions from specific training examples? In what settings will particular algorithms converge to the desired function, given sufficient training data? Which algorithms perform best for which types of problems and representations?

· How much training data is sufficient? What general bounds can be found to relate the confidence in learned hypotheses to the amount of training experience and the character of the learner's hypothesis space?

· When and how can prior knowledge held by the learner guide the process of generalizing from examples? Can prior knowledge be helpful even when it is only approximately correct?

· What is the best strategy for choosing a useful next training experience, and how does the choice of this strategy alter the complexity of the learning problem?

· What is the best way to reduce the learning task to one or more function approximation problems? Put another way, what specific functions should the system attempt to learn? Can this process itself be automated?

· How can the learner automatically alter its representation to improve its ability to represent and learn the target function?

6. A CONCEPT LEARNING TASK

6. A CONCEPT LEARNING TASK

Consider the example task of learning the target concept "Days on which Aldo enjoys his favourite water sport”

Example	Sky	AirTemp	Humidity	Wind	Water	Forecast	EnjoySport
1	Sunny	Warm	Normal	Strong	Warm	Same	Yes
2	Sunny	Warm	High	Strong	Warm	Same	Yes
3	Rainy	Cold	High	Strong	Warm	Change	No
4	Sunny	Warm	High	Strong	Cool	Change	Yes

Table: Positive and negative training examples for the target concept EnjoySport.

The task is to learn to predict the value of EnjoySport for an arbitrary day, based on the

values of its other attributes?

What hypothesis representation is provided to the learner?

· Let’s consider a simple representation in which each hypothesis consists of a conjunction of constraints on the instance attributes.

· Let each hypothesis be a vector of six constraints, specifying the values of the six attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.

For each attribute, the hypothesis will either

· Indicate by a "?' that any value is acceptable for this attribute,

· Specify a single required value (e.g., Warm) for the attribute, or

· Indicate by a "Φ" that no value is acceptable

If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive

example (h(x) = 1).

The hypothesis that PERSON enjoys his favorite sport only on cold days with high humidity

is represented by the expression

(?, Cold, High, ?, ?, ?)

The most general hypothesis-that every day is a positive example-is represented by

(?, ?, ?, ?, ?, ?)

The most specific possible hypothesis-that no day is a positive example-is represented by

(Φ, Φ, Φ, Φ, Φ, Φ)

Notation

· The set of items over which the concept is defined is called the set of instances, which is denoted by X.

Example: X is the set of all possible days, each represented by the attributes: Sky, AirTemp,

Humidity, Wind, Water, and Forecast

· The concept or function to be learned is called the target concept, which is denoted by c. c can be any Boolean valued function defined over the instances X

c: X→ {O, 1}

Example: The target concept corresponds to the value of the attribute EnjoySport

(i.e., c(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No).

· Instances for which c(x) = 1 are called positive examples, or members of the target concept.

· Instances for which c(x) = 0 are called negative examples, or non-members of the target

· concept.

· The ordered pair (x, c(x)) to describe the training example consisting of the instance x and its target concept value c(x).

· D to denote the set of available training examples

The symbol H to denote the set of all possible hypotheses that the learner may consider regarding the identity of the target concept. Each hypothesis h in H represents a Boolean valued

function defined over X

h: X→{O, 1}

The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all x in X.

___________________________________________________________________________

Ø Given:

· Instances X: Possible days, each described by the attributes

o Sky (with possible values Sunny, Cloudy, and Rainy),

o AirTemp (with values Warm and Cold),

o Humidity (with values Normal and High),

o Wind (with values Strong and Weak),

o Water (with values Warm and Cool),

o Forecast (with values Same and Change).

Hypotheses H: Each hypothesis is described by a conjunction of constraints on the

attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. The constraints may be "?" (any value is acceptable , “Φ” (no value e is acceptable , or a specific value.

· Target concept c: EnjoySport : X → {0, l}

· Training examples D: Positive and negative examples of the target function

Determine:

· A hypothesis h in H such that h(x) = c(x) for all x in X.

-----------------------------------------------------------------------------------------------------------------

Table: The EnjoySport concept learning task.

The inductive learning hypothesis

Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

5. CONCEPT LEARNING

5. CONCEPT LEARNING

· Learning involves acquiring general concepts from specific training examples. Example: People continually learn general concepts or categories such as "bird," "car," "situations in which I should study more to pass the exam," etc.

· Each such concept can be viewed as describing some subset of objects or events defined over a larger set

· Alternatively, each concept can be thought of as a Boolean-valued function defined over this larger set. (Example: A function defined over all animals, whose value is true for birds and false for other animals).

Definition: Concept learning - Inferring a Boolean-valued function from training examples of its input and output

2. DESIGNING A LEARNING SYSTEM

2. DESIGNING A LEARNING SYSTEM

The basic design issues and approaches to machine learning are illustrated by designing a program to learn to play checkers, with the goal of entering it in the world checkers tournament

1. Choosing the Training Experience

2. Choosing the Target Function

3. Choosing a Representation for the Target Function

4. Choosing a Function Approximation Algorithm

1. Estimating training values

2. Adjusting the weights

5. The Final Design

1. Choosing the Training Experience

• The first design choice is to choose the type of training experience from which the system will learn.

• The type of training experience available can have a significant impact on success or failure of the learner.

There are three attributes which impact on success or failure of the learner

1. Whether the training experience provides direct or indirect feedback regarding the choices made by the performance system.

For example, in checkers game:

In learning to play checkers, the system might learn from direct training examples

consisting of individual checkers board states and the correct move for each.

Indirect training examples consisting of the move sequences and final outcomes of various games played. The information about the correctness of specific moves early in the game must be inferred indirectly from the fact that the game was eventually won or lost.

Here the learner faces an additional problem of credit assignment, or determining the degree to which each move in the sequence deserves credit or blame for the final outcome. Credit assignment can be a particularly difficult problem because the game can be lost even when early moves are optimal, if these are followed later by poor moves.

Hence, learning from direct training feedback is typically easier than learning from indirect feedback.

2. The degree to which the learner controls the sequence of training examples

For example, in checkers game:

The learner might depends on the teacher to select informative board states and to provide the correct move for each.

Alternatively, the learner might itself propose board states that it finds particularly confusing and ask the teacher for the correct move.

The learner may have complete control over both the board states and (indirect) training classifications, as it does when it learns by playing against itself with no teacher present.

3. How well it represents the distribution of examples over which the final system performance P must be measured

For example, in checkers game:

In checkers learning scenario, the performance metric P is the percent of games the system wins in the world tournament.

If its training experience E consists only of games played against itself, there is a danger that this training experience might not be fully representative of the distribution of situations over which it will later be tested.

It is necessary to learn from a distribution of examples that is different from those on which the final system will be evaluated.

2. Choosing the Target Function

The next design choice is to determine exactly what type of knowledge will be learned and how this will be used by the performance program.

Let’s consider a checkers-playing program that can generate the legal moves from any board state.

The program needs only to learn how to choose the best move from among these legal moves. We must learn to choose among the legal moves, the most obvious choice for the type of information to be learned is a program, or function, that chooses the best move for any given board state.

1. Let ChooseMove be the target function and the notation is

ChooseMove : B→ M

which indicate that this function accepts as input any board from the set of legal board states B and produces as output some move from the set of legal moves M.

ChooseMove is a choice for the target function in checkers example, but this function will turn out to be very difficult to learn given the kind of indirect training experience available to our system.

2. An alternative target function is an evaluation function that assigns a numerical score

to any given board state

Let the target function V and the notation

V : B → R

which denote that V maps any legal board state from the set B to some real value. Intend for this target function V to assign higher scores to better board states. If the system can successfully learn such a target function V, then it can easily use it to select the best move from any current board position.

Let us define the target value V(b) for an arbitrary board state b in B, as follows:

o If b is a final board state that is won, then V(b) = 100

o If b is a final board state that is lost, then V(b) = -100

o If b is a final board state that is drawn, then V(b) = 0

o If b is a not a final state in the game, then V(b) = V(b' ),

Where b' is the best final board state that can be achieved starting from b and playing optimally until the end of the game.

3. Choosing a Representation for the Target Function

Let’s choose a simple representation - for any given board state, the function c will be calculated as a linear combination of the following board features:

• xl: the number of black pieces on the board

• x2: the number of red pieces on the board

• x3: the number of black kings on the board

• x4: the number of red kings on the board

• x5: the number of black pieces threatened by red (i.e., which can be captured on red's next turn)

• x6: the number of red pieces threatened by black

Thus, learning program will represent as a linear function of the form

Where,

· w0 through w6 are numerical coefficients, or weights, to be chosen by the learning

· algorithm.

· Learned values for the weights w1 through w6 will determine the relative importance

· of the various board features in determining the value of the board

· The weight w0 will provide an additive constant to the board value

4. Choosing a Function Approximation Algorithm

In order to learn the target function f we require a set of training examples, each describing a

specific board state b and the training value Vtrain(b) for b.

Each training example is an ordered pair of the form (b, Vtrain(b)).

For instance, the following training example describes a board state b in which black has won

the game (note x2 = 0 indicates that red has no remaining pieces) and for which the target function value Vtrain(b) is therefore +100.

((x1=3, x2=0, x3=1, x4=0, x5=0, x6=0), +100)

Function Approximation Procedure

1. Derive training examples from the indirect training experience available to the learner

2. Adjusts the weights wi to best fit these training examples

1. Estimating training values

A simple approach for estimating training values for intermediate board states is to

assign the training value of Vtrain(b) for any intermediate board state b to be

(Successor(b))

Where ,

• is the learner's current approximation to V

• Successor(b) denotes the next board state following b for which it is again the

• program's turn to move

Rule for estimating training values

Vtrain(b) ← (Successor(b))

2. Adjusting the weights

Specify the learning algorithm for choosing the weights wi to best fit the set of training examples {(b, Vtrain(b))}

A first step is to define what we mean by the bestfit to the training data.

One common approach is to define the best hypothesis, or set of weights, as that which minimizes the squared error E between the training values and the values predicted by the hypothesis.

Several algorithms are known for finding weights of a linear function that minimize E. One such algorithm is called the least mean squares, or LMS training rule. For each observed training example it adjusts the weights a small amount in the direction that reduces the error on this training example

LMS weight update rule :- For each training example (b, Vtrain(b)) Use the current weights to calculate Vˆ (b)

For each weight wi, update it as

wi ← wi + ƞ (Vtrain (b) - Vˆ (b)) xi

Here ƞ is a small constant (e.g., 0.1) that moderates the size of the weight update.

Working of weight update rule

• When the error (Vtrain(b)- Vˆ (b)) is zero, no weights are changed.

• When (Vtrain(b) - Vˆ (b)) is positive (i.e., when Vˆ (b) is too low), then each weight is increased in proportion to the value of its corresponding feature. This will raise the value of Vˆ (b), reducing the error.

• If the value of some feature xi is zero, then its weight is not altered regardless of the error, so that the only weights updated are those whose features actually occur on the training example board.

5. The Final Design

The final design of the checkers learning system can be described by four distinct program modules that represent the central components in many learning systems

1. The Performance System is the module that must solve the given performance task by

using the learned target function(s). It takes an instance of a new problem (new game)

as input and produces a trace of its solution (game history) as output.

2. The Critic takes as input the history or trace of the game and produces as output a set of training examples of the target function.

3. The Generalizer takes as input the training examples and produces an output hypothesis that is its estimate of the target function. It generalizes from the specific training examples, hypothesizing a general function that covers these examples and other cases beyond the training examples.

4. The Experiment Generator takes as input the current hypothesis and outputs a new problem (i.e., initial board state) for the Performance System to explore. Its role is to pick new practice problems that will maximize the learning rate of the overall system.

The sequence of design choices made for the checkers program is summarized in below figure

Machine Learning - Deep Learning

3. PERSPECTIVES AND ISSUES IN MACHINE LEARNING

6. A CONCEPT LEARNING TASK

5. CONCEPT LEARNING

2. DESIGNING A LEARNING SYSTEM

About Machine Learning

SOFTWARE ENGINEERING