How to deal with imbalanced dataset classification by support vector machine

58 views (last 30 days)
I have a dataset that is heavily skewed in one class. The training with support vector machine (SVM), by either fitcsvm.m or fitcecoc.m, cannot give desirable results. The accuracy for the class that has more samples is more than 90%, but for the class with much fewer samples is barely 70%. Is there any way to improve the training by SVM? or other methods that can be used to tackle the umbablanced data training?

Accepted Answer

Aditya Mittal
Aditya Mittal on 21 Apr 2020
Hi,
There are some ways which can be used to balance the dataset before fitting to the classifier to get the better result. These methods are as follows:
  • Under Sampling- Removing the unwanted or repeated data from the majority class and keep only a part of these useful points. In this way, there can be some balance in the data.
  • Over Sampling- Try to get more data points for the minority class. Or try to replicate some of the data points of the minority class in order to increase cardinality.
  • Generate Data- You can decide to generate synthetic data for the minority class for balancing the data. This can be done using SMOTE method. Below is the link to use SMOTE method-
  • https://www.mathworks.com/matlabcentral/fileexchange/38830-smote-synthetic-minority-over-sampling-technique
The results vary according to the problem. And accuracy is not always the best performance matric when evaluating imbalanced data. Therefore you should try different performance metrics which can give better insight.
  • Confusion matrix
  • Precision
  • Recall
  • F1 score
Try fitting the data to various machine learning models like hybrid or ensemble machine learning algorithms (e.g. Adaboost), or deep learning models can be used in order to receive better results.
  3 Comments

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!