How do you group categorical variables in order to create a boxplot?

2 views (last 30 days)
I have a set of data which has multiple categorical variables which need to be grouped together for analysis.
For example, I have a set of data where there four categorical variables: Microtopography, Structure, Burn Severity, and Canopy.
I want to group each combination of these four variables into one "group": Example - A = MicrotA, StructA, BurnA & CanoA as one group against B = MicrotB, StructureB, BurnB, CanoB. I would then like to plot these newly created groups (A & B) on the x-axis of a boxplot. Each combination of my original categorical variables is a unique, non-numerical variable. The y-axis will be a numerical variable.
I've looked into creating an index using string comparison, but for some reason cannot get it to work. I'm using data from a Table which I have imported form a Microsoft Access Database.

Accepted Answer

dbmn
dbmn on 17 Mar 2017
I'm not entirely sure what you want to do, but this little example, that I created might help you to get started
% create some x data to experiment (you already have these)
x_data = categorical({'a', 'b', 'c', 'a', 'd', 'f', 'b', 'a', 'c', 'f', 'd'})
% create some y data to experiment (you already have these)
y_data = rand(size(x_data));
% Get Indices of your new master category that contains all elements a, b, c
group_index = [x_data == 'a' | x_data == 'b' | x_data == 'c'];
Now the question is what you want to do with that info. One possibility is to create a new vector (that can be added to the data vectors that you already have).
% Get all the y data
y_data_new = y_data(group_index);
% And then create some dummy x data
x_data_new = categorical(repmat({'your group name'}, size(y_data_new)));
% Add those things to an existing vector, or do whatever you need
x_data = [x_data, x_data_new];
y_data = [y_data, y_data_new];
  2 Comments
Jessica Ritz
Jessica Ritz on 21 Mar 2017
I ended up doing it a different way, using string comparisons, but this would have also worked. Thanks for the help.
For other people's reference:
ind1 = strcmp(PRS_t.CANOPY,'OP') & strcmp(PRS_t.MICROT,'HU');
h1 = boxplot(Numerical Variable,'Notch','on','Labels',{'ind1','ind2'});
Peter Perkins
Peter Perkins on 26 Apr 2017
If PRS_t.CANOPY is a categorical, you do not need strcmp. Just use ==, as in dbmn's suggestion.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!