Why does the SMO solver with fitcsvm takes longer than L1-QP solver to converge with larger BoxConstraint (C) values?

3 views (last 30 days)

Yildirim Kocoglu on 12 Mar 2021

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/770413-why-does-the-smo-solver-with-fitcsvm-takes-longer-than-l1-qp-solver-to-converge-with-larger-boxconst

Hello everyone,

I'm working on a mini project for a class where we have been asked to implement L1-QP (solver) and SMO (solver) in fitcsvm and compare their convergence time with increasing number of examples. The problem itself is not linearly separable but, we are using the Soft Margin SVM so, it does converge to a solution with fitcsvm.

Below is the results I get: Y-axis is the average time it took for each method (3 methods in green,blue and red lines) to converge to a solution. X-axis is the number of samples starting from 100-1000 in increments of 100 (100,200,300,...,1000). We sample from the same distribution to increase the size of our dataset. The blue line is fitcsvm('Solver'=L1-QP) and the green line is fitcsvm('Solver'=SMO). The red line is my own implementation of SVM using quadprog only (without any fitcsvm) which I confirmed the validity using a simple dataset. The graph on the left has a C value ('Boxconstraint') of 0.1 and the right graph has a C value of 100. As it can be seen from both graphs, the SMO outperforms the other methods when C=0.1 (small value) and on the right graph, SMO starts becoming less efficient than the other methods when C=100. I came across an explanation in the documentation that says:

Sparsity in support vectors is a desirable property of an SVM classifier. To decrease the number of support vectors, set BoxConstraint to a large value. This action increases the training time.

The above kind of explains why the one on the right might happen but, it does not say if it is the case for both SMO and quadprog or just SMO. If the statement is true only for SMO, is this the norm? What I mean is: Is this what you would expect in all or most cases in the actual SMO implementation (not just Matlab implementation of SMO)? If this is expected and the norm, can someone please explain why this happens? An equation from the SMO implementation during the explanation will highly be appreciated. Also, I still can't explain why the SMO (green line) is suddenly faster when C=100 (right graph), when dataset sample size is inreased at certain points and not at other points (ups and downs). Thank you.