resampling an unbalanced dataset
3 views (last 30 days)
Show older comments
Hi, I have a dataset which has 2 classes(churn='False.' and churn='True.'). It is unbalanced because 700 of the 5000 sample is churn='False.' Is there a way to balance that distribution? Thank you in advance.
0 Comments
Accepted Answer
Image Analyst
on 3 Jan 2015
Throw out all but 700 items where churn = true??? Then you'd have 700 false ones and 700 true ones. If not, then tell us in more detail what "balance" means to you.
3 Comments
Image Analyst
on 3 Jan 2015
Uh, sure, if that's what you want. If it's in a table, you can automate it somewhat, like
% Find out which rows are true.
trueRows = find(t.churn);
% Take only the first 700:
trueRows = trueRows(1:max([length(trueRows), 700]));
% Find out which rows are false - we want to keep all those.
falseRows = find(t.churn == false);
% Combine the false and true rows into one list of indexes.
rowsToExtract = sort([falseRows, trueRows]);
% Now extract only the first 700 true, but all the false.
t = t(rowsToExtract );
or something like that. You might have to debug it some.
More Answers (0)
See Also
Categories
Find more on Data Type Identification in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!