How to deal with imbalanced dataset classification by support vector machine

Question

Yuzhen Lu on 17 Apr 2020

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/518564-how-to-deal-with-imbalanced-dataset-classification-by-support-vector-machine

Commented: Esmeralda Ruiz Pujadas on 22 Mar 2023

I have a dataset that is heavily skewed in one class. The training with support vector machine (SVM), by either fitcsvm.m or fitcecoc.m, cannot give desirable results. The accuracy for the class that has more samples is more than 90%, but for the class with much fewer samples is barely 70%. Is there any way to improve the training by SVM? or other methods that can be used to tackle the umbablanced data training?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Aditya Mittal on 21 Apr 2020

4
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/518564-how-to-deal-with-imbalanced-dataset-classification-by-support-vector-machine#answer_427263

Hi,

There are some ways which can be used to balance the dataset before fitting to the classifier to get the better result. These methods are as follows:

Under Sampling- Removing the unwanted or repeated data from the majority class and keep only a part of these useful points. In this way, there can be some balance in the data.
Over Sampling- Try to get more data points for the minority class. Or try to replicate some of the data points of the minority class in order to increase cardinality.
Generate Data- You can decide to generate synthetic data for the minority class for balancing the data. This can be done using SMOTE method. Below is the link to use SMOTE method-
https://www.mathworks.com/matlabcentral/fileexchange/38830-smote-synthetic-minority-over-sampling-technique

The results vary according to the problem. And accuracy is not always the best performance matric when evaluating imbalanced data. Therefore you should try different performance metrics which can give better insight.

Confusion matrix
Precision
Recall
F1 score

Try fitting the data to various machine learning models like hybrid or ensemble machine learning algorithms (e.g. Adaboost), or deep learning models can be used in order to receive better results.

4 Comments
Show 2 older commentsHide 2 older comments

Kenta on 11 Jul 2020

The answer from Dr. Aditya Mittal is very informative. The example of oversampling is posted here. I hope it helps you.

https://jp.mathworks.com/matlabcentral/fileexchange/78020-oversampling-for-deep-learning-classification-example

Esmeralda Ruiz Pujadas on 22 Mar 2023

You cannot use those methods directly, you are touching the validation. And SVM is different than deep learning. You cannot especify directly the validation in svm....

Sign in to comment.

How to deal with imbalanced dataset classification by support vector machine

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

How to deal with imbalanced dataset classification by support vector machine

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments