How to choose the most significant variables from possible 57 variables for neural network input?

2 views (last 30 days)
i'm going to use neural network to predict land value. i have determine the 57 variables that may affect the land value. All variables are the distance to some central business district, the distance to some school, the distance to some hospital, and the distance to the main road. How can i choose the most significant variables for the neural network input for the land value as the target.
I have been experimenting with inserting one by one variable as the neural network input to the land value as the target by using a the neural network fitting function toolbox, but none of the variables give R square higher than 50%.
So anyone can tell me what method can i use to select the most significant variables?

Accepted Answer

Greg Heath
Greg Heath on 25 Jul 2012
Edited: Greg Heath on 6 Mar 2017
Backward stepwise (NOT stagewise) search has worked well for me in the task of classifying stars using stellar spectra( Backward stagewise search considers replacing variables that have been previously rejected).
To rank the variables:
1. Train the model using all variables
2. While the number of active variables exceeds 1
a. Find the variable that yields the best performance when it is
replaced by it's mean value and the model is retrained
b. Make the replacement permanent and reduce the count of active
variables by 1.
end
Comments:
1. Sometimes I get a preliminary feel for the significant variables by using STEPWISEFIT in the backward and forward "stagewise" modes to construct 1st and 2nd order polynomial models.
2. Some variations
a. Use a stagewise search with the NN model (Time consuming).
b. Use a forward search (Tends to be inferior with
correlated variables)
c. Either delete the variable row or fill it with zeros
(Prolongs retraining)
d. Replace the variable row with a random reordering
(May need multiple (e.g., 20?) reorderings to get
consistent results)
3. The best variation depends on the number of data points, number of variables, amount of variable correlations and complexity of the I/O relationship.
The misnamed MATLAB functions STEPWISE(GUI) and STEPWISEFIT actually do both forward and backward stagewise searches on models that are linear in their coefficients(e.g.,polynomials). Therefore they cannot be used for NN models.
However, there is a new MATLAB function, SEQUENTIALFS, that performs both forward and backward stepwise(NOT stagewise) searches on nonlinear models represented by function handles(i.e., @). Their usefullness for the NN models in the NN TBX seems to be unexplored.

More Answers (1)

vijay
vijay on 24 Jul 2012
I suggest u train the ANN with all 45 variables as input and one variable as target output. once u are satisfied with the value of MSE note it down.
Then in for loop u leave out each variable turn by turn and train the network each time , test it and note down the MSE.
The highest MSE will give u the most significant input variable. Pl maintain the same data set that u used for training and testing during your initial ANN model development.
Another method is of partial derivative for which I am not much aware.
Vijay

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!