2. DESIGNING A LEARNING SYSTEM
2.
DESIGNING A LEARNING SYSTEM
The
basic design issues and approaches to machine learning are illustrated by
designing a program to learn to play checkers, with the goal of entering it in
the world checkers tournament
1. Choosing the
Training Experience
2.
Choosing the Target Function
3.
Choosing a Representation for the Target Function
4. Choosing a Function Approximation Algorithm
1.
Estimating training values
2.
Adjusting the weights
5.
The Final Design
1.
Choosing the Training Experience
•
The first design choice is to choose the
type of training experience from which the system will learn.
•
The type of training experience available
can have a significant impact on success or failure of the learner.
There
are three attributes which impact on success or failure of the learner
1. Whether
the training experience provides direct or indirect feedback regarding the
choices made by the performance system.
For
example, in checkers game:
In learning to
play checkers, the system might learn from direct training examples
consisting of
individual checkers board states and the correct move for each.
Indirect training
examples consisting of the move sequences and final outcomes of various games
played. The information about the correctness of specific moves early in the
game must be inferred indirectly from the fact that the game was eventually won
or lost.
Here the learner
faces an additional problem of credit assignment, or determining the degree to
which each move in the sequence deserves credit or blame for the final outcome.
Credit assignment can be a particularly difficult problem because the game can
be lost even when early moves are optimal, if these are followed later by poor
moves.
Hence, learning
from direct training feedback is typically easier than learning from indirect
feedback.
2. The
degree to which the learner controls the sequence of training examples
For example, in
checkers game:
The learner might
depends on the teacher to select informative board states and to provide the
correct move for each.
Alternatively, the
learner might itself propose board states that it finds particularly confusing
and ask the teacher for the correct move.
The learner may
have complete control over both the board states and (indirect) training
classifications, as it does when it learns by playing against itself with no
teacher present.
3. How
well it represents the distribution of examples over which the final system
performance P must be measured
For example, in checkers
game:
In checkers
learning scenario, the performance metric P is the percent of games the system
wins in the world tournament.
If its training
experience E consists only of games played against itself, there is a danger
that this training experience might not be fully representative of the
distribution of situations over which it will later be tested.
It is necessary to
learn from a distribution of examples that is different from those on which the
final system will be evaluated.
2.
Choosing the Target Function
The next design
choice is to determine exactly what type of knowledge will be learned and how
this will be used by the performance program.
Let’s consider a
checkers-playing program that can generate the legal moves from any board
state.
The program needs
only to learn how to choose the best move from among these legal moves. We must
learn to choose among the legal moves, the most obvious choice for the type of
information to be learned is a program, or function, that chooses the best move
for any given board state.
1.
Let ChooseMove be the target function and the notation is
ChooseMove : B→ M
which indicate
that this function accepts as input any board from the set of legal board
states B and produces as output some move from the set of legal moves M.
ChooseMove is a
choice for the target function in checkers example, but this function will turn
out to be very difficult to learn given the kind of indirect training
experience available to our system.
2. An alternative
target function is an evaluation function that assigns a numerical score
to any given board
state
Let the target
function V and the notation
V
: B → R
which denote that
V maps any legal board state from the set B to some real value. Intend for this
target function V to assign higher scores to better board states. If the system
can successfully learn such a target function V, then it can easily use it to
select the best move from any current board position.
Let us define the
target value V(b) for an arbitrary board state b in B, as follows:
o
If b is a final board state that is won,
then V(b) = 100
o
If b is a final board state that is lost,
then V(b) = -100
o
If b is a final board state that is drawn,
then V(b) = 0
o
If b is a not a final state in the game,
then V(b) = V(b' ),
Where b' is the
best final board state that can be achieved starting from b and playing
optimally until the end of the game.
3.
Choosing a Representation for the Target Function
Let’s choose a
simple representation - for any given board state, the function c will be
calculated as a linear combination of the following board features:
•
xl: the number of black pieces on the
board
•
x2: the number of red pieces on the board
•
x3: the number of black kings on the board
•
x4: the number of red kings on the board
•
x5: the number of black pieces threatened
by red (i.e., which can be captured on red's next turn)
•
x6: the number of red pieces threatened by
black
Thus, learning
program will represent as a linear function of the form
Where,
· w0
through w6 are numerical coefficients, or weights, to be chosen by the learning
· algorithm.
· Learned
values for the weights w1 through w6 will determine the relative importance
· of
the various board features in determining the value of the board
· The
weight w0 will provide an additive constant to the board value
4.
Choosing a Function Approximation Algorithm
In
order to learn the target function f we require a set of training examples,
each describing a
specific
board state b and the training value Vtrain(b) for b.
Each
training example is an ordered pair of the form (b, Vtrain(b)).
For
instance, the following training example describes a board state b in which
black has won
the
game (note x2 = 0 indicates that red has no remaining pieces) and for which
the target function value Vtrain(b) is therefore +100.
((x1=3,
x2=0, x3=1, x4=0, x5=0, x6=0), +100)
Function
Approximation Procedure
1. Derive
training examples from the indirect training experience available to the
learner
2.
Adjusts the weights wi to best fit these training examples
1. Estimating
training values
A
simple approach for estimating training values for intermediate board states is
to
assign
the training value of Vtrain(b) for any intermediate board state b to be
(Successor(b))
Where
,
•
is the learner's current
approximation to V
•
Successor(b) denotes the next board state
following b for which it is again the
•
program's turn to move
Rule
for estimating training values
Vtrain(b)
← (Successor(b))
2.
Adjusting the weights
Specify
the learning algorithm for choosing the weights wi to best fit the set of
training examples {(b, Vtrain(b))}
A
first step is to define what we mean by the bestfit to the training data.
One
common approach is to define the best hypothesis, or set of weights, as that
which minimizes the squared error E between the training values and the values
predicted by the hypothesis.
Several
algorithms are known for finding weights of a linear function that minimize E.
One such algorithm is called the least mean squares, or LMS training rule. For
each observed training example it adjusts the weights a small amount in the
direction that reduces the error on this training example
LMS
weight update rule :- For each training example (b, Vtrain(b)) Use the current
weights to calculate Vˆ (b)
For
each weight wi, update it as
wi
← wi + ƞ (Vtrain (b) - Vˆ (b)) xi
Here
ƞ is a small constant (e.g., 0.1) that moderates the size of the weight update.
Working
of weight update rule
•
When the error (Vtrain(b)- Vˆ (b)) is
zero, no weights are changed.
•
When (Vtrain(b) - Vˆ (b)) is positive
(i.e., when Vˆ (b) is too low), then each weight is increased in proportion to
the value of its corresponding feature. This will raise the value of Vˆ (b),
reducing the error.
•
If the value of some feature xi is zero,
then its weight is not altered regardless of the error, so that the only
weights updated are those whose features actually occur on the training example
board.
5.
The Final Design
The
final design of the checkers learning system can be described by four distinct
program modules that represent the central components in many learning systems
1. The
Performance System is the module that must solve the given performance task by
using the learned
target function(s). It takes an instance of a new problem (new game)
as input and
produces a trace of its solution (game history) as output.
2. The
Critic takes as input the history or trace of the game and produces as output a
set of training examples of the target function.
3. The
Generalizer takes as input the training examples and produces an output
hypothesis that is its estimate of the target function. It generalizes from the
specific training examples, hypothesizing a general function that covers these
examples and other cases beyond the training examples.
4. The
Experiment Generator takes as input the current hypothesis and outputs a new
problem (i.e., initial board state) for the Performance System to explore. Its
role is to pick new practice problems that will maximize the learning rate of
the overall system.
The
sequence of design choices made for the checkers program is summarized in below
figure
Comments
Post a Comment