Here is a radial basis network with *R* inputs.

Notice that the expression for the net input of a `radbas`

neuron is different from that of
other neurons. Here the net input to the `radbas`

transfer
function is the vector distance between its weight vector **w** and the input vector **p**,
multiplied by the bias *b*. (The `||`

`dist`

`||`

box in this
figure accepts the input vector **p** and
the single row input weight matrix, and produces the dot product of
the two.)

The transfer function for a radial basis neuron is

$$radbas(n)={e}^{-{n}^{2}}$$

Here is a plot of the `radbas`

transfer
function.

The radial basis function has a maximum of
1 when its input is 0. As the distance between **w** and **p** decreases, the output increases. Thus, a
radial basis neuron acts as a detector that produces 1 whenever the
input **p** is identical to its weight
vector **w**.

The bias *b* allows the sensitivity of the `radbas`

neuron to be adjusted. For example,
if a neuron had a bias of 0.1 it would output 0.5 for any input vector **p** at vector distance of 8.326 (0.8326/*b*)
from its weight vector **w**.

Radial basis networks consist of two layers: a hidden radial
basis layer of *S*^{1} neurons,
and an output linear layer of *S*^{2} neurons.

The `||`

`dist`

`||`

box
in this figure accepts the input vector **p** and
the input weight matrix **IW**^{1,1},
and produces a vector having *S*_{1} elements.
The elements are the distances between the input vector and vectors _{i}**IW**^{1,1} formed from
the rows of the input weight matrix.

The bias vector **b**^{1} and
the output of `||`

`dist`

`||`

are
combined with the MATLAB^{®} operation .* , which does element-by-element
multiplication.

The output of the first layer for a feedforward network `net`

can
be obtained with the following code:

a{1} = radbas(netprod(dist(net.IW{1,1},p),net.b{1}))

Fortunately, you won't have to write such lines of code. All
the details of designing this network are built into design functions `newrbe`

and `newrb`

,
and you can obtain their outputs with `sim`

.

You can understand how this network behaves by following an
input vector **p** through the network
to the output **a**^{2}.
If you present an input vector to such a network, each neuron in the
radial basis layer will output a value according to how close the
input vector is to each neuron's weight vector.

Thus, radial basis neurons with weight vectors quite different
from the input vector **p** have outputs
near zero. These small outputs have only a negligible effect on the
linear output neurons.

In contrast, a radial basis neuron with a weight vector close
to the input vector **p** produces a
value near 1. If a neuron has an output of 1, its output weights in
the second layer pass their values to the linear neurons in the second
layer.

In fact, if only one radial basis neuron had an output of 1, and all others had outputs of 0s (or very close to 0), the output of the linear layer would be the active neuron's output weights. This would, however, be an extreme case. Typically several neurons are always firing, to varying degrees.

Now look in detail at how the first layer operates. Each neuron's
weighted input is the distance between the input vector and its weight
vector, calculated with `dist`

.
Each neuron's net input is the element-by-element product of its weighted
input with its bias, calculated with `netprod`

.
Each neuron's output is its net input passed through `radbas`

. If a neuron's weight vector is
equal to the input vector (transposed), its weighted input is 0, its
net input is 0, and its output is 1. If a neuron's weight vector is
a distance of `spread`

from the input vector, its
weighted input is `spread`

, its net input is sqrt(−log(.5))
(or 0.8326), therefore its output is 0.5.

You can design radial basis networks with the function `newrbe`

. This function can produce a network
with zero error on training vectors. It is called in the following
way:

net = newrbe(P,T,SPREAD)

The function `newrbe`

takes
matrices of input vectors `P`

and target vectors `T`

,
and a spread
constant `SPREAD`

for the radial basis layer, and
returns a network with weights and biases such that the outputs are
exactly `T`

when the inputs are `P`

.

This function `newrbe`

creates
as many `radbas`

neurons as there
are input vectors in `P`

, and sets the first-layer
weights to `P'`

. Thus, there is a layer of `radbas`

neurons in which each neuron acts
as a detector for a different input vector. If there are *Q* input
vectors, then there will be *Q* neurons.

Each bias in the first layer is set to 0.8326/`SPREAD`

.
This gives radial basis functions that cross 0.5 at weighted inputs
of +/− `SPREAD`

. This determines the width
of an area in the input space to which each neuron responds. If `SPREAD`

is
4, then each `radbas`

neuron will
respond with 0.5 or more to any input vectors within a vector distance
of 4 from their weight vector. `SPREAD`

should be
large enough that neurons respond strongly to overlapping regions
of the input space.

The second-layer weights IW ^{2}^{,1} (or
in code, `IW{2,1}`

) and biases b^{2} (or
in code, `b{2}`

) are found by simulating the first-layer
outputs a^{1} (`A{1}`

), and
then solving the following linear expression:

[W{2,1} b{2}] * [A{1}; ones(1,Q)] = T

You know the inputs to the second layer (`A{1}`

)
and the target (`T`

), and the layer is linear. You
can use the following code to calculate the weights and biases of
the second layer to minimize the sum-squared error.

Wb = T/[A{1}; ones(1,Q)]

Here `Wb`

contains both weights and biases,
with the biases in the last column. The sum-squared error is always
0, as explained below.

There is a problem with *C* constraints (input/target
pairs) and each neuron has *C* +1 variables (the *C* weights
from the *C* `radbas`

neurons,
and a bias). A linear problem with *C* constraints
and more than *C* variables has an infinite number
of zero error solutions.

Thus, `newrbe`

creates a
network with zero error on training vectors. The only condition required
is to make sure that `SPREAD`

is large enough that
the active input regions of the `radbas`

neurons
overlap enough so that several `radbas`

neurons
always have fairly large outputs at any given moment. This makes the
network function smoother and results in better generalization for
new input vectors occurring between input vectors used in the design.
(However, `SPREAD`

should not be so large that each
neuron is effectively responding in the same large area of the input
space.)

The drawback to `newrbe`

is
that it produces a network with as many hidden neurons as there are
input vectors. For this reason, `newrbe`

does
not return an acceptable solution when many input vectors are needed
to properly define a network, as is typically the case.

The function `newrb`

iteratively
creates a radial basis network one neuron at
a time. Neurons are added to the network until the sum-squared error
falls beneath an error goal or a maximum number of neurons has been
reached. The call for this function is

net = newrb(P,T,GOAL,SPREAD)

The function `newrb`

takes
matrices of input and target vectors `P`

and `T`

,
and design parameters `GOAL`

and `SPREAD`

,
and returns the desired network.

The design method of `newrb`

is
similar to that of `newrbe`

. The
difference is that `newrb`

creates
neurons one at a time. At each iteration the input vector that results
in lowering the network error the most is used to create a `radbas`

neuron. The error of the new network
is checked, and if low enough `newrb`

is
finished. Otherwise the next neuron is added. This procedure is repeated
until the error goal is met or the maximum number of neurons is reached.

As with `newrbe`

, it is important
that the spread parameter be large enough that the `radbas`

neurons respond to overlapping regions
of the input space, but not so large that all the neurons respond
in essentially the same manner.

Why not always use a radial basis network instead of a standard
feedforward network? Radial basis networks, even when designed efficiently
with `newrbe`

, tend to have many
times more neurons than a comparable feedforward network with `tansig`

or `logsig`

neurons
in the hidden layer.

This is because sigmoid neurons can have outputs over a large
region of the input space, while `radbas`

neurons
only respond to relatively small regions of the input space. The result
is that the larger the input space (in terms of number of inputs,
and the ranges those inputs vary over) the more `radbas`

neurons
required.

On the other hand, designing a radial basis network often takes much less time than training a sigmoid/linear network, and can sometimes result in fewer neurons' being used, as can be seen in the next example.

The example `demorb1`

shows how a radial basis
network is used to fit a function. Here the problem is solved with
only five neurons.

Examples `demorb3`

and `demorb4`

examine
how the spread constant affects the design process for radial basis
networks.

In `demorb3`

, a radial basis network is designed
to solve the same problem as in `demorb1`

. However,
this time the spread constant used is 0.01. Thus, each radial basis
neuron returns 0.5 or lower for any input vector with a distance of
0.01 or more from its weight vector.

Because the training inputs occur at intervals of 0.1, no two radial basis neurons have a strong output for any given input.

`demorb3`

showed that having too small a spread
constant can result in a solution that does not generalize from the
input/target vectors used in the design. Example `demorb4`

shows
the opposite problem. If the spread constant is large enough, the
radial basis neurons will output large values (near 1.0) for all the
inputs used to design the network.

If all the radial basis neurons always output 1, any information
presented to the network becomes lost. No matter what the input, the
second layer outputs 1's. The function `newrb`

will
attempt to find a network, but cannot because of numerical problems
that arise in this situation.

The moral of the story is, choose a spread constant larger than the distance between adjacent input vectors, so as to get good generalization, but smaller than the distance across the whole input space.

For this problem that would mean picking a spread constant greater than 0.1, the interval between inputs, and less than 2, the distance between the leftmost and rightmost inputs.