Choosing the best set of initial weights of a neural network to train all dataset

Question

Mirko Job on 8 Aug 2019

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/475370-choosing-the-best-set-of-initial-weights-of-a-neural-network-to-train-all-dataset

Commented: Jonathan De Sousa on 30 Jan 2022

Accepted Answer: Sourav Bairagya

I am developing a neural network for pattern recognition in Matlab.

Currently:

1) I divide my dataset into 6 folds (5 folds CV + 1 fold Test, which represent unseen data);

2) I choose 10 different number of hidden neurons;

3) I choose 10 different sets of initial weights (random);

4) For each fold (as test) (k);

- For each number of hidden neurons (i);

- - For each set of initial weights (j);

- - - I perform 5 fold CV (4 training and 1 early stop), saving the average performance (R^2) on Training Validation and Test and the average number of epochs of training across all iterations of the crossvalidation ([i,j,k] element of the result matrixes);

5) Averaging across the 6 different choices of test folds (k) (10x10x6 -> 10x10) I obtain a general estimate of the different models accross the entire DataSET considered as unseen data;

6) I choose the optimal number of hidden neurons as the value that describes the model which performs better in average across 10 iteration of different sets of initial weights (j);

7) I choose the number of training epochs as the average of training epochs found across the ten iteration of initial weights (j) for all possible choice of test set (k);

Now i have the number of hidden neurons and the number of epochs to train the final model on all data.

My question is how should i choose the initial set of weights ? Should I choose again ten sets of initial weights and train 10 different networks with the previous defined parameters to find the best ? In this case (since i don't have validation and test), the resulted net will not be overfitted?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Sourav Bairagya on 12 Aug 2019

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/475370-choosing-the-best-set-of-initial-weights-of-a-neural-network-to-train-all-dataset#answer_387226

The simplest way to initialize weights and biases is to set those to small uniform random values which works well for neural networks with a single hidden layer. But, when number of hidden layers is more than one, then you can use a good initialization scheme like “Glorot (also known as Xavier) Initialization”.

As we don’t know anything about the dataset beforehand, hence one good way is to assign the weights from a Gaussian distribution which have zero mean and some finite variance. With each passing layer, it is expected that the variance should remain same. This will help to keep the signal from exploding to a high value or vanishing to zero. In other words, it basically keeps the variance same for input and output for a hidden layer in the network and prevent the network from being overfitted.

According to the “Glorot/Xavier Initialization process”, the weights are initialized as follows (as written in this pseudo-code format):

for each hidden layer weight:

variance=2.0/(number of input + number of output);

stddev = sqrt(variance);

weight = gaussian(mean=0.0, stddev);

end for

You can try this approach in your model to initialize the weights prior to training. As weight initialization does not depend upon the dataset, hence, there is no need to choose again ten sets of initial weights and train those different networks with the previously defined parameters to find the best one.

You can also use “fullyConnectedLayer” from “Deep Learning Toolbox”. Then, there the default initializer is ‘glorot’ initializer. For more information regarding this you can follow this link:

https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.fullyconnectedlayer.html

1 Comment
Show -1 older commentsHide -1 older comments

Jonathan De Sousa on 30 Jan 2022

The Glorot initialisation scheme does not actually use a Gaussian distribution. The weights are sampled rather from a uniform distribution. Have a look at: https://uk.mathworks.com/help/deeplearning/ug/initialize-learnable-parameters-for-custom-training-loop.html#mw_1bd0f2c3-c7df-4841-89ce-a7574d2db8d9

Sign in to comment.

Choosing the best set of initial weights of a neural network to train all dataset

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Choosing the best set of initial weights of a neural network to train all dataset

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments