How to group data within a column by specific text within that column
16 views (last 30 days)
Show older comments
I have a dataset of about 260,000 data points. One of the columns, "species_name'' has various species names within the column. How can I group this data by specific species names (and therefore, group the data in the other columns within the dataset (size, for example) by specific species names)?
2 Comments
Adam Danz
on 7 Feb 2021
Are you just trying to index the table?
load fisheriris
T = table(categorical(species), meas(:,1),meas(:,2),meas(:,3),meas(:,4));
T.Properties.VariableNames{1} = 'Species'
T(T.Species=='virginica',:)
Answers (2)
dpb
on 6 Feb 2021
A sample dataset always helps, but probably be good to convert species to a categorical variable first (although not mandatory).
Then using grouping variables -- see
doc findgroups
doc splitapply
if keeping data in an array or look at
doc rowfun
for table, timetable.
2 Comments
dpb
on 7 Feb 2021
Well, w/o something to work with, it's harder to guess...attach the table or .mat file with the data, or a short text listing of enough to illustrate.
Then, give us a precise definition of the problem to be solved.
Also, show us what you have tried and where you had a problem.
As I've pointed out in several related Q? recently, rarely do you really need to actually separate out the data into separate arrays; instead of duplicating data already have, use grouping variables and process as wanted.
dpb
on 7 Feb 2021
Illustration with faked data...
tmp=categorical({'star','bat','crab'}); % the categorical variable categories
t=table(tmp(randi(3,[20,1])).',randn(20,1),'VariableNames',{'Species','Size'}); % make up some data
>> head(t) % show what first little bit looks like...
ans =
8×2 table
Species Size
_______ ________
bat -0.65863
crab -1.2834
crab 0.23872
bat 1.5475
star 0.1869
star -1.8809
crab 0.40569
bat 0.64618
>> summary(t) % summary statistics on the table
Variables:
Species: 20×1 categorical
Values:
bat 6
crab 9
star 5
Size: 20×1 double
Values:
Min -1.8809
Median 0.21281
Max 1.5967
>> rowfun(@mean,t,'GroupingVariables','Species', ...
'InputVariables','Size','OutputVariableNames','GroupMean') % group means
ans =
3×3 table
Species GroupCount GroupMean
_______ __________ _________
bat 6 0.42427
crab 9 0.10477
star 5 -0.46693
>>
Can do whatever wanted...
0 Comments
See Also
Categories
Find more on Text Data Preparation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!