Rosenblatt [Rose61] created many variations of the perceptron. One of the simplest was a single-layer network whose weights and biases could be trained to produce a correct target vector when presented with the corresponding input vector. The training technique used is called the perceptron learning rule. The perceptron generated great interest due to its ability to generalize from its training vectors and learn from initially randomly distributed connections. Perceptrons are especially suited for simple problems in pattern classification. They are fast and reliable networks for the problems they can solve. In addition, an understanding of the operations of the perceptron provides a good basis for understanding more complex networks.

The discussion of perceptrons in this section is necessarily brief. For a more thorough discussion, see Chapter 4, “Perceptron Learning Rule,” of [HDB1996], which discusses the use of multiple layers of perceptrons to solve more difficult problems beyond the capability of one layer.

A perceptron neuron, which uses the hard-limit transfer function `hardlim`

, is shown below.

Each external input is weighted with an appropriate weight *w*_{1j},
and the sum of the weighted inputs is sent to the hard-limit transfer
function, which also has an input of 1 transmitted to it through the
bias. The hard-limit transfer function, which returns a 0 or a 1,
is shown below.

The perceptron neuron produces a 1 if the net input into the transfer function is equal to or greater than 0; otherwise it produces a 0.

The hard-limit transfer function gives a perceptron the ability
to classify input vectors by dividing the input space into
two regions. Specifically, outputs will be 0 if the net input *n* is
less than 0, or 1 if the net input *n* is 0 or greater.
The following figure show the input space of a two-input hard limit
neuron with the weights *w*_{1,1} =
−1, *w*_{1,2} = 1 and
a bias *b* = 1.

Two classification regions are formed by the *decision boundary* line L at

**Wp** + *b* = 0. This line is
perpendicular to the weight matrix **W** and
shifted according to the bias *b*. Input vectors
above and to the left of the line L will result in a net input greater
than 0 and, therefore, cause the hard-limit neuron to output a 1.
Input vectors below and to the right of the line L cause the neuron
to output 0. You can pick weight and bias values to orient and move
the dividing line so as to classify the input space as desired.

Hard-limit neurons without a bias will always have a classification line going through the origin. Adding a bias allows the neuron to solve problems where the two sets of input vectors are not located on different sides of the origin. The bias allows the decision boundary to be shifted away from the origin, as shown in the plot above.

You might want to run the example program `nnd4db`

.
With it you can move a decision boundary around, pick new inputs to
classify, and see how the repeated application of the learning rule
yields a network that does classify the input vectors properly.

The perceptron network consists of a single layer of `S`

perceptron
neurons connected to *R* inputs through a set of
weights *w _{i,j}*, as shown below
in two forms. As before, the network indices

The perceptron learning rule described shortly is capable of
training only a single layer. Thus only one-layer networks are considered
here. This restriction places limitations on the computation a perceptron
can perform. The types of problems that perceptrons are capable of
solving are discussed in Limitations and Cautions*.*

You can create a perceptron with the following:

net = perceptron; net = configure(net,P,T);

where input arguments are as follows:

`P`

is an R-by-Q matrix of Q input vectors of R elements each.`T`

is an S-by-Q matrix of Q target vectors of S elements each.

Commonly, the `hardlim`

function
is used in perceptrons, so it is the default.

The following commands create a perceptron network with a single one-element input vector with the values 0 and 2, and one neuron with outputs that can be either 0 or 1:

P = [0 2]; T = [0 1]; net = perceptron; net = configure(net,P,T);

You can see what network has been created by executing the following command:

inputweights = net.inputweights{1,1}

which yields

inputweights = delays: 0 initFcn: 'initzero' learn: true learnFcn: 'learnp' learnParam: (none) size: [1 1] weightFcn: 'dotprod' weightParam: (none) userdata: (your custom info)

The default learning function is `learnp`

,
which is discussed in Perceptron Learning Rule (learnp).
The net input to the `hardlim`

transfer
function is `dotprod`

, which generates
the product of the input vector and weight matrix and adds the bias
to compute the net input.

The default initialization function `initzero`

is
used to set the initial values of the weights to zero.

Similarly,

biases = net.biases{1}

gives

biases = initFcn: 'initzero' learn: 1 learnFcn: 'learnp' learnParam: [] size: 1 userdata: [1x1 struct]

You can see that the default initialization for the bias is also 0.

Perceptrons are trained on examples of desired behavior. The desired behavior can be summarized by a set of input, output pairs

$${p}_{1}{t}_{1},{p}_{2}{t}_{1},\dots ,{p}_{Q}{t}_{Q}$$

where **p** is an input to the
network and **t** is the corresponding
correct (target) output. The objective is to reduce the error **e**, which is the difference **t** − **a** between the neuron response **a** and the target vector **t**.
The perceptron learning rule `learnp`

calculates
desired changes to the perceptron's weights and biases, given an input
vector **p** and the associated error **e**. The target vector **t** must
contain values of either 0 or 1, because perceptrons (with `hardlim`

transfer functions) can only output
these values.

Each time `learnp`

is executed,
the perceptron has a better chance of producing the correct outputs.
The perceptron rule is proven to converge on a solution in a finite
number of iterations if a solution exists.

If a bias is not used, `learnp`

works
to find a solution by altering only the weight vector **w** to point toward input vectors to be classified
as 1 and away from vectors to be classified as 0. This results in
a decision boundary that is perpendicular to **w** and
that properly classifies the input vectors.

There are three conditions that can occur for a single neuron
once an input vector **p** is presented
and the network's response **a** is calculated:

**CASE 1.** If an input vector
is presented and the output of the neuron is correct (**a** = **t** and **e** = **t** – **a** = 0), then the weight vector **w** is not altered.

**CASE 2.** If the neuron output
is 0 and should have been 1 (**a** =
0 and **t** = 1, and **e** = **t** – **a** =
1), the input vector **p** is added to
the weight vector **w**. This makes the
weight vector point closer to the input vector, increasing the chance
that the input vector will be classified as a 1 in the future.

**CASE 3.** If the neuron output
is 1 and should have been 0 (**a** =
1 and **t** = 0, and **e** = **t** – **a** = –1), the input vector **p** is subtracted from the weight vector **w**. This makes the weight vector point farther
away from the input vector, increasing the chance that the input vector
will be classified as a 0 in the future.

The perceptron learning rule can be written more succinctly
in terms of the error **e** = **t** – **a** and
the change to be made to the weight vector Δ**w**:

**CASE 1.** If **e** =
0, then make a change Δ**w** equal
to 0.

**CASE 2.** If **e** =
1, then make a change Δ**w** equal
to **p**^{T}.

**CASE 3.** If **e** =
–1, then make a change Δ**w** equal
to –**p**^{T}.

All three cases can then be written with a single expression:

$$\Delta w=(t-\alpha ){p}^{T}=e{p}^{T}$$

You can get the expression for changes in a neuron's bias by noting that the bias is simply a weight that always has an input of 1:

$$\Delta b=(t-\alpha )(1)=e$$

For the case of a layer of neurons you have

$$\Delta W=(t-a){(p)}^{T}=e{(p)}^{T}$$

and

$$\Delta b=(t-a)=e$$

The perceptron learning rule can be summarized as follows:

$${W}^{new}={W}^{old}+e{p}^{T}$$

and

$${b}^{new}={b}^{old}+e$$

where **e** = **t** – **a**.

Now try a simple example. Start with a single neuron having an input vector with just two elements.

net = perceptron; net = configure(net,[0;0],0);

To simplify matters, set the bias equal to 0 and the weights to 1 and -0.8:

net.b{1} = [0]; w = [1 -0.8]; net.IW{1,1} = w;

The input target pair is given by

p = [1; 2]; t = [1];

You can compute the output and error with

a = net(p) a = 0 e = t-a e = 1

and use the function `learnp`

to
find the change in the weights.

dw = learnp(w,p,[],[],[],[],e,[],[],[],[],[]) dw = 1 2

The new weights, then, are obtained as

w = w + dw w = 2.0000 1.2000

The process of finding new weights (and biases) can be repeated until there are no errors. Recall that the perceptron learning rule is guaranteed to converge in a finite number of steps for all problems that can be solved by a perceptron. These include all classification problems that are linearly separable. The objects to be classified in such cases can be separated by a single line.

You might want to try the example `nnd4pr`

.
It allows you to pick new input vectors and apply the learning rule
to classify them.

If `sim`

and `learnp`

are used repeatedly to present inputs
to a perceptron, and to change the perceptron weights and biases according
to the error, the perceptron will eventually find weight and bias
values that solve the problem, given that the perceptron *can* solve
it. Each traversal through all the training input and target vectors
is called a *pass*.

The function `train`

carries
out such a loop of calculation. In each pass the function `train`

proceeds through the specified sequence
of inputs, calculating the output, error, and network adjustment for
each input vector in the sequence as the inputs are presented.

Note that `train`

does not
guarantee that the resulting network does its job. You must check
the new values of **W** and **b** by computing the network output for each
input vector to see if all targets are reached. If a network does
not perform successfully you can train it further by calling `train`

again with the new weights and biases
for more training passes, or you can analyze the problem to see if
it is a suitable problem for the perceptron. Problems that cannot
be solved by the perceptron network are discussed in Limitations and Cautions.

To illustrate the training procedure, work through a simple problem. Consider a one-neuron perceptron with a single vector input having two elements:

This network, and the problem you are about to consider, are simple enough that you can follow through what is done with hand calculations if you want. The problem discussed below follows that found in [HDB1996].

Suppose you have the following classification problem and would like to solve it with a single vector input, two-element perceptron network.

$$\left\{{p}_{1}=\left[\begin{array}{l}2\\ 2\end{array}\right],{t}_{1}=0\}\left\{{p}_{2}=\left[\begin{array}{c}1\\ -2\end{array}\right],{t}_{2}=1\right\}\left\{{p}_{3}=\left[\begin{array}{c}-2\\ 2\end{array}\right],{t}_{3}=0\right\}\{{p}_{4}=\left[\begin{array}{c}-1\\ 1\end{array}\right],{t}_{4}=1\right\}$$

Use the initial weights and bias. Denote the variables at each
step of this calculation by using a number in parentheses after the
variable. Thus, above, the initial values are **W**(0)
and *b*(0).

$$\begin{array}{cc}W(0)=\left[\begin{array}{cc}0& 0\end{array}\right]& b(0)=0\end{array}$$

Start by calculating the perceptron’s output *a* for
the first input vector **p**_{1},
using the initial weights and bias.

$$\begin{array}{c}\alpha =hardlim(W(0){p}_{1}+b(0))\\ =hardlim\left(\left[\begin{array}{cc}0& 0\end{array}\right]\left[\begin{array}{l}2\\ 2\end{array}\right]+0\right)=hardlim(0)=1\end{array}$$

The output *a* does not equal the target value *t*_{1},
so use the perceptron rule to find the incremental changes to the
weights and biases based on the error.

$$\begin{array}{l}e={t}_{1}-\alpha =0-1=-1\\ \Delta W=e{p}_{1}^{T}=(-1)\left[\begin{array}{cc}2& 2\end{array}\right]=\left[\begin{array}{cc}-2& -2\end{array}\right]\\ \Delta b=e=(-1)=-1\end{array}$$

You can calculate the new weights and bias using the perceptron update rules.

$$\begin{array}{l}{W}^{new}={W}^{old}+e{p}^{T}=\left[\begin{array}{cc}0& 0\end{array}\right]+\left[\begin{array}{cc}-2& -2\end{array}\right]=\left[\begin{array}{cc}-2& -2\end{array}\right]=W(1)\\ {b}^{new}={b}^{old}+e=0+(-1)=-1=b(1)\end{array}$$

Now present the next input vector, **p**_{2}.
The output is calculated below.

$$\begin{array}{c}\alpha =hardlim(W(1){p}_{2}+b(1))\\ =hardlim\left(\left[\begin{array}{cc}-2& -2\end{array}\right]\left[\begin{array}{r}1\\ -2\end{array}\right]-1\right)=hardlim(1)=1\end{array}$$

On this occasion, the target is 1, so the error is zero. Thus
there are no changes in weights or bias, so **W**(2)
= **W**(1) = [−2 −2] and *b*(2)
= *b*(1) = −1.

You can continue in this fashion, presenting **p**_{3} next,
calculating an output and the error, and making changes in the weights
and bias, etc. After making one pass through all of the four inputs,
you get the values **W**(4) = [−3
−1] and *b*(4) = 0. To determine whether a
satisfactory solution is obtained, make one pass through all input
vectors to see if they all produce the desired target values. This
is not true for the fourth input, but the algorithm does converge
on the sixth presentation of an input. The final values are

**W**(6) = [−2 −3]
and *b*(6) = 1.

This concludes the hand calculation. Now, how can you do this
using the `train`

function?

The following code defines a perceptron.

net = perceptron;

Consider the application of a single input

p = [2; 2];

having the target

t = [0];

Set `epochs`

to 1, so that `train`

goes through the input vectors (only
one here) just one time.

net.trainParam.epochs = 1; net = train(net,p,t);

The new weights and bias are

w = net.iw{1,1}, b = net.b{1} w = -2 -2 b = -1

Thus, the initial weights and bias are 0, and after training on only the first vector, they have the values [−2 −2] and −1, just as you hand calculated.

Now apply the second input vector **p**_{2}.
The output is 1, as it will be until the weights and bias are changed,
but now the target is 1, the error will be 0, and the change will
be zero. You could proceed in this way, starting from the previous
result and applying a new input vector time after time. But you can
do this job automatically with `train`

.

Apply `train`

for one epoch,
a single pass through the sequence of all four input vectors. Start
with the network definition.

net = perceptron; net.trainParam.epochs = 1;

The input vectors and targets are

p = [[2;2] [1;-2] [-2;2] [-1;1]] t = [0 1 0 1]

Now train the network with

net = train(net,p,t);

The new weights and bias are

w = net.iw{1,1}, b = net.b{1} w = -3 -1 b = 0

This is the same result as you got previously by hand.

Finally, simulate the trained network for each of the inputs.

a = net(p) a = 0 0 1 1

The outputs do not yet equal the targets, so you need to train the network for more than one pass. Try more epochs. This run gives a mean absolute error performance of 0 after two epochs:

net.trainParam.epochs = 1000; net = train(net,p,t);

Thus, the network was trained by the time the inputs were presented on the third epoch. (As you know from hand calculation, the network converges on the presentation of the sixth input vector. This occurs in the middle of the second epoch, but it takes the third epoch to detect the network convergence.) The final weights and bias are

w = net.iw{1,1}, b = net.b{1} w = -2 -3 b = 1

The simulated output and errors for the various inputs are

a = net(p) a = 0 1 0 1 error = a-t error = 0 0 0 0

You confirm that the training procedure is successful. The network converges and produces the correct target outputs for the four input vectors.

The default training function for networks created with `perceptron`

is `trainc`

. (You can find this by executing `net.trainFcn`

.)
This training function applies the perceptron learning rule in its
pure form, in that individual input vectors are applied individually,
in sequence, and corrections to the weights and bias are made after
each presentation of an input vector. Thus, perceptron training with `train`

will converge in a finite number
of steps unless the problem presented cannot be solved with a simple
perceptron.

The function `train`

can
be used in various ways by other networks as well. Type ```
help
train
```

to read more about this basic function.

You might want to try various example programs. For instance, `demop1`

illustrates
classification and training of a simple perceptron.

Perceptron networks should be trained with `adapt`

, which presents the input vectors
to the network one at a time and makes corrections to the network
based on the results of each presentation. Use of `adapt`

in this way guarantees that any linearly
separable problem is solved in a finite number of training presentations.

As noted in the previous pages, perceptrons can also be trained
with the function `train`

. Commonly
when `train`

is used for perceptrons,
it presents the inputs to the network in batches, and makes corrections
to the network based on the sum of all the individual corrections.
Unfortunately, there is no proof that such a training algorithm converges
for perceptrons. On that account the use of `train`

for
perceptrons is not recommended.

Perceptron networks have several limitations. First, the
output values of a perceptron can take on only one of two values (0
or 1) because of the hard-limit transfer function. Second, perceptrons
can only classify linearly separable sets of vectors. If a straight
line or a plane can be drawn to separate the input vectors into their
correct categories, the input vectors are linearly separable. If the
vectors are not linearly separable, learning will never reach a point
where all vectors are classified properly. However, it has been proven
that if the vectors are linearly separable, perceptrons trained adaptively
will always find a solution in finite time. You might want to try `demop6`

.
It shows the difficulty of trying to classify input vectors that are
not linearly separable.

It is only fair, however, to point out that networks with more than one perceptron can be used to solve more difficult problems. For instance, suppose that you have a set of four vectors that you would like to classify into distinct groups, and that two lines can be drawn to separate them. A two-neuron network can be found such that its two decision boundaries classify the inputs into four categories. For additional discussion about perceptrons and to examine more complex perceptron problems, see [HDB1996].

Long training times can be caused by the presence of an *outlier* input vector whose length
is much larger or smaller than the other input vectors. Applying the
perceptron learning rule involves adding and subtracting input vectors
from the current weights and biases in response to error. Thus, an
input vector with large elements can lead to changes in the weights
and biases that take a long time for a much smaller input vector to
overcome. You might want to try `demop4`

to see how
an outlier affects the training.

By changing the perceptron learning rule slightly, you can make training times insensitive to extremely large or small outlier input vectors.

Here is the original rule for updating weights:

$$\Delta w=(t-\alpha ){p}^{T}=e{p}^{T}$$

As shown above, the larger an input vector **p**,
the larger its effect on the weight vector **w**.
Thus, if an input vector is much larger than other input vectors,
the smaller input vectors must be presented many times to have an
effect.

The solution is to normalize the rule so that the effect of each input vector on the weights is of the same magnitude:

$$\Delta w=(t-\alpha )\frac{{p}^{T}}{\Vert p\Vert}=e\frac{{p}^{T}}{\Vert p\Vert}$$

The normalized perceptron
rule is implemented with the function `learnpn`

,
which is called exactly like `learnp`

.
The normalized perceptron rule function `learnpn`

takes
slightly more time to execute, but reduces the number of epochs considerably
if there are outlier input vectors. You might try `demop5`

to
see how this normalized training rule works.