How to select the number of samples to train a Machine Learning algorithm?

1 view (last 30 days)
I working in a dataset of 12000 samples concerning about 5 years of an industrial process.
It is likely that during this time the plant has undergone changes (equipments, the performance drop itself, chemical products).
Is there a tool for identifying the best subset of this data? In my view, a temporal cut in the data could increase the quality of the models created.
  3 Comments
Jose Marques
Jose Marques on 31 Jan 2019
Thanks for the comment!
The dataset has 426 inputs (I am using techniques for feature selection too).
I am using four algorithms to create the models: Regression Tree, Bagged Trees, SVM and Neural Networks.
Greg Heath
Greg Heath on 4 Feb 2019
As a common sense rule of thumb I try to use at least 10 to 30 times as many training points as unknown parameters that have to be estimated.
In addition I use 10 to 20 sets of random initial weights.
I assume , of course, that you ave examined plots of the data to initialize your common sense.
Hope this Helps
Greg

Sign in to comment.

Answers (1)

BERGHOUT Tarek
BERGHOUT Tarek on 3 Feb 2019
u can use deep belif networks ; they are the best for feature sellection and mapping; and train you network by driven chunks of data "by randomly chosing a pairs of (inputs,targets)" and in the same time pire attention to your approximation function you must keep your error function in its local minimam. deep belif nets depands on a set of stacked auto_encoders that allows to tune all the parameters of the networks with small amount of training data

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!