PLA: Perceptron Learning Algorithm (Adaptive Decision Boundary)
Recap the main idea:
Training and testing: Generalization
If there is no assumption on how the past (i.e. training data) is related to the
future (i.e. test data), prediction is impossible. The relationship between past
and future observations is that they both are sampled independently from the
same distribution (the variables are i.i.d. – independent and identically
Perceptron learning (Adaptive decision boundary)
Feature vector x = (x1, x2, …, xM) (M dimensions)
D = w0 + [url removed, login to view] + ... + [url removed, login to view]
D >= 0: +1
D < 0: -1
w0 + [url removed, login to view] + ... + [url removed, login to view] = 0 is a hyperplane that divides the feature space into two
1. Initialize the weights wi , i = 0, …, M, to zero or to small random values or
to some initial guesses. Choose positive constants c and k for the step size.
2. Choose the next sample point from the training set. Let the true class be d
(either +1 or -1).
3. Compute D = w0 + [url removed, login to view] + ... + [url removed, login to view] . If sign(D) = d, no change is made to
the weights. [sign(D) returns 1 if D >= 0 and -1 otherwise.] if sign(D) != d,
then do the following updates:
wi = wi + cdxi for i = 1, …, M
w0 = w0 + cdk
4. Repeat steps 2 to 4 with each of the samples in the training set. Stop and
output the result if the termination condition is met. As long as the
termination condition is not satisfied, run through the entire training data set
repeatedly. The termination condition is reached when
a correct classification (with zero error) is reached, or
a pre-determined maximum number of iterations is reached, or
the error rate ceases to decrease significantly (stagnation).
Is the decision boundary thus found optimal?
What if the classes are not linearly separable?
What is the difference between this approach and the least-squares method?
The problem boils down to finding a set of weights wi’s (assume c and k are predetermined
Example: Training set = Two points in one dimension: x = -1 (d = 1) and x = -4 (d
= -1). Sample size = 2.
Discriminant D = w0 + w1x
Initial choice w0 = 1, w1 = 2 (arbitrary)
First point: D = 1 + 2(-1) = -1 whose sign does not match the sign of the given d of
1 for this point. So we need to update the weights: w0 = w0 + cdk = 1 + 1 (1) 1
(assuming c = k = 1) = 2, and w1 = w1 + cdx = 2 + 1 (1)(-1) = 1.
Thus, revised weights (first update): w0 = 2, w1 = 1
Now, test the second point with the revised weights:
Second point: D = 2 + 1(-4) = -2 whose sign matches that of the given d of -1 for
this point => no need to update the weights. This completes the first pass through
the training set.
Next, the second pass through the training set:
First point: D = 2 + 1(-1) = 1 which matches the sign of the true (given) d for this
Second point: D = 2 + 1(-4) = -2 which matches the sign of the true (given) d for
The stopping condition is thus satisfied. Solution: 2 + x = 0 or x = -2.
Algebraic proof that the weight update rule moves the decision boundary in
the right (correct) direction in each step:
D = w0 + ∑