Interpretation of plot: classification loss about number of features and number of observations

1 view (last 30 days)
Hi everyone,
I am having 2 classes and 30 Probands that each have "created" 6 observations, 3 in each class and I want to use fitctree for a "simple" classification algorithm.
To test the system I have created bar3 plots with N-Probands from 1 to 20 (6 to 120 observations for the training set) and N-Features from 1 to 10.
In a first step I test every single of my 41 features by training fitctree(xFeature) with one feature and calculate the loss of that model. That way I can rank my features and sort them by loss. choosing n features for the plot means n best features from that rank.
So my questiond here are how do I properly interpret my plots/ Am I making mistakes/ what kind of mistakes might I have made.
Especially: How do I explain the zeros for 4-6 probands? I really don't understand how it is possible to have so many zero-losses in that area of the plot
Plot when loss is calculated with
cvloss(tree, 'KFold', 5);
Plot when loss is calculated with
loss(tree, X, Y);
Thanks a lot in advance!
some additional infos:
Probands wear sensors from which I use the acceleration-data. From that data I am calculating a bunch of paramaters on each of the axis of each of the (3) sensors.
  1 Comment
asaad sellmann
asaad sellmann on 12 Oct 2020
Edited: asaad sellmann on 12 Oct 2020
Using the classification learner with 20 out of 30 Probands I get prediction accuracies of 95.8%. Exporting the model and then using the predction function with my remaining 10 probands leaves me with 95.1%
I am very uncertain if this is possible. Maybe you could give me some hints on how to test for overfitting. Or am I to suspicious and the fact that my probands were healthy subjects that were instructed right before the recording of the data leads to such a "good" dataset?
-----
after a few iterations with fewer features I even get to 100% prediciton accuracy with a quadratic svm..

Sign in to comment.

Answers (1)

Aditya Patil
Aditya Patil on 17 Nov 2020
As per my understanding, when you plot losses for each feature individually, you get zero loss for some of them.
By default, the trees are grown to full depth, hence it's possible that some of the variables give zero loss. This generally suggests that either you have no noise in that variable, or that the data quality is not good(eg: no data close to decision boundary).
Also note that you can't get feature importance by testing the features separately. A feature's importance depends upon what other features are used. For example, if x1 and x2 are two features, and , then using both x1 and x2 is not useful. You should use one of the many feature reduction/selection algorithms available to decide importance.
  2 Comments
asaad sellmann
asaad sellmann on 17 Nov 2020
Well, thanks a lot for your answer!
For example, if x1 and x2 are two features, and , then using both x1 and x2 is not useful
My features are mathematically independet of each other. They are all calculated directly from the raw acceleration readings...
...that either you have no noise in that variable, or that the data quality is not good...
..not being close to decision boundaries might very well be possible. A (maybe) rooky mistake I made is that I did not randomly took probands for the calculation. They were always by the same order. I will now change that and see how that affects my graphs.
Thanks again and I'll report my findings here
asaad sellmann
asaad sellmann on 18 Nov 2020
Edited: asaad sellmann on 19 Nov 2020
So after calculating cvloss with random combinations of proband data I got the graph you can see above. One more mistake is, that number of features does not refer to a random combination of features (which also wouldn't really be useful), but to the first, first and second, first second and third... feature from the ranked features. That means every combination has the features from the combinations before.
I'll now dig deeper into my feature selection/ranking procedure, before bothering you guys again.
Do you have any suggestion for a binary problem with parameter features calculated on raw acceleration data? I am always very insecure when using functions like "rankfeatures", "predictorimportance", etc...
thanks a lot for your advice and ideas!

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!