2. DESIGNING A LEARNING SYSTEM

 

2. DESIGNING A LEARNING SYSTEM

The basic design issues and approaches to machine learning are illustrated by designing a program to learn to play checkers, with the goal of entering it in the world checkers tournament

1. Choosing the Training Experience

2. Choosing the Target Function

3. Choosing a Representation for the Target Function

4. Choosing a Function Approximation Algorithm



1. Estimating training values

2. Adjusting the weights

5. The Final Design

 

1. Choosing the Training Experience

       The first design choice is to choose the type of training experience from which the system will learn.

       The type of training experience available can have a significant impact on success or failure of the learner.

There are three attributes which impact on success or failure of the learner

1.     Whether the training experience provides direct or indirect feedback regarding the choices made by the performance system.

For example, in checkers game:

In learning to play checkers, the system might learn from direct training examples

consisting of individual checkers board states and the correct move for each.

Indirect training examples consisting of the move sequences and final outcomes of various games played. The information about the correctness of specific moves early in the game must be inferred indirectly from the fact that the game was eventually won or lost.

Here the learner faces an additional problem of credit assignment, or determining the degree to which each move in the sequence deserves credit or blame for the final outcome. Credit assignment can be a particularly difficult problem because the game can be lost even when early moves are optimal, if these are followed later by poor moves.

Hence, learning from direct training feedback is typically easier than learning from indirect feedback.

2.     The degree to which the learner controls the sequence of training examples

For example, in checkers game:

The learner might depends on the teacher to select informative board states and to provide the correct move for each.

Alternatively, the learner might itself propose board states that it finds particularly confusing and ask the teacher for the correct move.

The learner may have complete control over both the board states and (indirect) training classifications, as it does when it learns by playing against itself with no teacher present.

 

3.     How well it represents the distribution of examples over which the final system performance P must be measured

For example, in checkers game:

In checkers learning scenario, the performance metric P is the percent of games the system wins in the world tournament.

If its training experience E consists only of games played against itself, there is a danger that this training experience might not be fully representative of the distribution of situations over which it will later be tested.

It is necessary to learn from a distribution of examples that is different from those on which the final system will be evaluated.

 

2. Choosing the Target Function

The next design choice is to determine exactly what type of knowledge will be learned and how this will be used by the performance program.

Let’s consider a checkers-playing program that can generate the legal moves from any board state.

The program needs only to learn how to choose the best move from among these legal moves. We must learn to choose among the legal moves, the most obvious choice for the type of information to be learned is a program, or function, that chooses the best move for any given board state.

 

1. Let ChooseMove be the target function and the notation is

 

ChooseMove : B→ M

which indicate that this function accepts as input any board from the set of legal board states B and produces as output some move from the set of legal moves M.

ChooseMove is a choice for the target function in checkers example, but this function will turn out to be very difficult to learn given the kind of indirect training experience available to our system.

2. An alternative target function is an evaluation function that assigns a numerical score

to any given board state

Let the target function V and the notation

V : B → R

which denote that V maps any legal board state from the set B to some real value. Intend for this target function V to assign higher scores to better board states. If the system can successfully learn such a target function V, then it can easily use it to select the best move from any current board position.

 

Let us define the target value V(b) for an arbitrary board state b in B, as follows:

o   If b is a final board state that is won, then V(b) = 100

o   If b is a final board state that is lost, then V(b) = -100

o   If b is a final board state that is drawn, then V(b) = 0

o   If b is a not a final state in the game, then V(b) = V(b' ),

Where b' is the best final board state that can be achieved starting from b and playing optimally until the end of the game.

 

3. Choosing a Representation for the Target Function

Let’s choose a simple representation - for any given board state, the function c will be calculated as a linear combination of the following board features:

       xl: the number of black pieces on the board

       x2: the number of red pieces on the board

       x3: the number of black kings on the board

       x4: the number of red kings on the board

       x5: the number of black pieces threatened by red (i.e., which can be captured on red's next turn)

       x6: the number of red pieces threatened by black

Thus, learning program will represent as a linear function of the form

Where,

·       w0 through w6 are numerical coefficients, or weights, to be chosen by the learning

·       algorithm.

·       Learned values for the weights w1 through w6 will determine the relative importance

·       of the various board features in determining the value of the board

·       The weight w0 will provide an additive constant to the board value

4. Choosing a Function Approximation Algorithm

In order to learn the target function f we require a set of training examples, each describing a

specific board state b and the training value Vtrain(b) for b.

Each training example is an ordered pair of the form (b, Vtrain(b)).

For instance, the following training example describes a board state b in which black has won

the game (note x2 = 0 indicates that red has no remaining pieces) and for which the  target  function value Vtrain(b) is therefore +100.

((x1=3, x2=0, x3=1, x4=0, x5=0, x6=0), +100)

Function Approximation Procedure

1. Derive training examples from the indirect training experience available to the learner

2. Adjusts the weights wi to best fit these training examples

1. Estimating training values

A simple approach for estimating training values for intermediate board states is to

assign the training value of Vtrain(b) for any intermediate board state b to be

(Successor(b))

Where ,

     is the learner's current approximation to V

       Successor(b) denotes the next board state following b for which it is again the

       program's turn to move

Rule for estimating training values

Vtrain(b) ← (Successor(b))

 

2. Adjusting the weights

Specify the learning algorithm for choosing the weights wi to best fit the set of training examples {(b, Vtrain(b))}

A first step is to define what we mean by the bestfit to the training data.

One common approach is to define the best hypothesis, or set of weights, as that which minimizes the squared error E between the training values and the values predicted by the hypothesis.

Several algorithms are known for finding weights of a linear function that minimize E. One such algorithm is called the least mean squares, or LMS training rule. For each observed training example it adjusts the weights a small amount in the direction that reduces the error on this training example

LMS weight update rule :- For each training example (b, Vtrain(b)) Use the current weights to calculate Vˆ (b)

For each weight wi, update it as

wi ← wi + ƞ (Vtrain (b) - Vˆ (b)) xi

Here ƞ is a small constant (e.g., 0.1) that moderates the size of the weight update.

Working of weight update rule

       When the error (Vtrain(b)- Vˆ (b)) is zero, no weights are changed.

       When (Vtrain(b) - Vˆ (b)) is positive (i.e., when Vˆ (b) is too low), then each weight is increased in proportion to the value of its corresponding feature. This will raise the value of Vˆ (b), reducing the error.

       If the value of some feature xi is zero, then its weight is not altered regardless of the error, so that the only weights updated are those whose features actually occur on the training example board.

5. The Final Design

The final design of the checkers learning system can be described by four distinct program modules that represent the central components in many learning systems

 

1.     The Performance System is the module that must solve the given performance task by

using the learned target function(s). It takes an instance of a new problem (new game)

as input and produces a trace of its solution (game history) as output.

2.     The Critic takes as input the history or trace of the game and produces as output a set of training examples of the target function.

3.     The Generalizer takes as input the training examples and produces an output hypothesis that is its estimate of the target function. It generalizes from the specific training examples, hypothesizing a general function that covers these examples and other cases beyond the training examples.

4.     The Experiment Generator takes as input the current hypothesis and outputs a new problem (i.e., initial board state) for the Performance System to explore. Its role is to pick new practice problems that will maximize the learning rate of the overall system.

 

The sequence of design choices made for the checkers program is summarized in below figure



Comments