Best way to seperate a very large array of numbers into different variables based on a value tat is given in the first column?

1 view (last 30 days)
I have a variable that contains experimental data, it is 96x300001. (however the size will change every time I run my script). The first column gives a value which represents a certain group (e.g group 1, group 2 group 3 etc. The variable kinda looks like this =
1 0.021 0.0433 0.656,
2 0.00321 0.335 0.605,
3 0.1 0.035 0.625. etc etc
I want to tell MATLAB "numbers next to the number 1 go into variable "group1" and numbers next to the number 2 go into a variable called "group2" and so on, but I have no idea how to do this!
  1 Comment
Marissa Holden
Marissa Holden on 17 Oct 2017
Edited: Marissa Holden on 17 Oct 2017
For anyone looking to answer this question in the future I ended up using the "find" function. "data" is the variable containing my data -
idx_group3 = find(data(:,1)==3); -- this will output a variable with the number of each row that starts with the number 3.
Then, I tell matlab to put the data from those particular rows (from 2 to the end of the array) into a different variable.
data_group3 = data(idx_group3,2:end);

Sign in to comment.

Accepted Answer

Cedric
Cedric on 17 Oct 2017
Here is an example: we build a fake data set:
>> data = [sort(randi(3,10,1)), rand(10,5)]
data =
1.0000 0.9727 0.6504 0.8675 0.9329 0.2748
1.0000 0.9217 0.1238 0.1572 0.7492 0.7884
1.0000 0.7864 0.9880 0.6371 0.8526 0.8418
1.0000 0.2187 0.6194 0.9931 0.9952 0.9586
2.0000 0.1218 0.9761 0.8390 0.6048 0.2233
2.0000 0.5117 0.0724 0.9759 0.2118 0.7046
3.0000 0.6800 0.3582 0.4349 0.8793 0.8167
3.0000 0.8622 0.4866 0.7729 0.8863 0.8443
3.0000 0.6063 0.8812 0.5100 0.6926 0.9223
3.0000 0.7486 0.0681 0.4883 0.6132 0.4820
and we group it this way:
>> groups = splitapply( @(x){x(:,2:end)}, data, data(:,1) )
groups =
3×1 cell array
{4×5 double}
{2×5 double}
{4×5 double}
Now groups is a cell array that we can index using the group ID:
>> groups{1}
ans =
0.9727 0.6504 0.8675 0.9329 0.2748
0.9217 0.1238 0.1572 0.7492 0.7884
0.7864 0.9880 0.6371 0.8526 0.8418
0.2187 0.6194 0.9931 0.9952 0.9586
>> groups{2}
ans =
0.1218 0.9761 0.8390 0.6048 0.2233
0.5117 0.0724 0.9759 0.2118 0.7046
>> groups{3}
ans =
0.6800 0.3582 0.4349 0.8793 0.8167
0.8622 0.4866 0.7729 0.8863 0.8443
0.6063 0.8812 0.5100 0.6926 0.9223
0.7486 0.0681 0.4883 0.6132 0.4820
  4 Comments
Cedric
Cedric on 17 Oct 2017
Edited: Cedric on 17 Oct 2017
Yes, try the following instead:
groups = mat2cell( data(:,2:end), accumarray( data(:,1), 1 )) ;
If you evaluate
accumarray( data(:,1), 1 )
you will see that it computes the length of each group, and then reading the doc of MAT2CELL, you'll understand what we do here.

Sign in to comment.

More Answers (1)

Image Analyst
Image Analyst on 17 Oct 2017
You may not need to do that. Whether you do or not depends on what you're going to do next. For example if you just wanted the means or other simple stats, you could use grpstats() without creating variables group1, group2, group3, .... groupn. Or if you just wanted to loop over all groups, you could just extract rows one group at a time into a single variable "thisGroup" and then process that within the loop. So, assuming you know what groups each row belongs to (and you do already know this), what do you plan on doing next?
  6 Comments
Image Analyst
Image Analyst on 17 Oct 2017
Marissa: Do you have the Statistics and Machine Learning Toolbox? This is exactly what it does, with no need to create a bunch of separately named variables!

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!