Machine learning and data normalization - how data should(?) be normalized.
9 views (last 30 days)
Hello, I have a general question about data normalization for classification algorithms: if I have a training set and a testing set, should I normalize them separately or join them for normalization step? And what if later I would like to use this classifier to classify a totally new portion of data? Should I keep extreme values of each feature to use them for normalization?
Second question I have: Is normalization really necessary? Does SVM need it?
Thank you in advance for any help. Cheers, Michael
BERGHOUT Tarek on 3 Feb 2019
1-you can normalize the eparately or together but the best way is to normalize the inside the trainig function ; if you add the normelization function inside the trainig function , you can use it for any dataset after that .
2- yes normalization alwaze necesery if and ownly if the activation fuinctions of your training model are bounded otherwise you don't have to normelize tham;
and for SVM if the kerenel function is bounded you must normelize you data.
Mostafa Nakhaei on 18 Oct 2019
Please note that the best practice in machine learning is to keep the distribution of testing and training the same. So, if you want to normalize your data, it is good to do the normalization on whole dataset first and then separate them. thus, your testing and training will have the same distribution. The common error is to separate the data and then normalize them individually.