How to traverse table row based on sets
Show older comments
Let's say that I have a table in an excel spreadsheet that looks like this (based off of real GPS data, but for the sake of this question, all of these are just random numbers ):
|Set number| PRN | X-coordinate | y-coordinate | z-coordinate ...
| 1 | 17 | 24715 | -1470517 | 223695 |
| 1 | 17 | 24715 | -1470517 | 223695 |
...
| 2 | 17 | 24715 | -1470517 | 223695 |
| 2 | 17 | 24715 | -1470517 | 223695 |
So I know that I can read in the data using the build in MATLAB command:
gpsdata = readtable('spreadsheet.xlsx')
I also know I can loop through the table by row using the command:
for row = 1:height(gpsdata)
However, more specifically, I want to loop through all the table based off the set numbers in each row. One snag is that the number of set numbers varies. Sometimes there will be 8 rows of set 1, other times there will be 9 rows of set 2, etc.
My specific aim is to do specific operations on each of the sets. Looking at the table, normally there is a different PRN for each satellite. That means each measurement is a different satellite. The set is just a way to acknowledge that it is measuring the same thing. For example, the set 1s will have about 8 different satellites all measuring whatever is at set 1. However, each of the satellites will have a different measurement. My goal is to take all the measurements and use iterative least squares to estimate the exact measurement. However, that involves getting measurements from each of the different satellites within each row of the set.
Is there a way that I could loop through based off each of the set numbers to make sure that the ones that I am using are the same for each set? That is, what is the best way to operate on all the rows with set data 1s, then operate on all the data with set 2s, etc.?
While I could do each set manually, I feel as though there is a better, more programmatic way to approach this.
Thanks for your time and help in advanced! I really appreciate it!
15 Comments
billyjthorton
on 7 Oct 2018
In general, you can group sets of data based on some grouping variable and perform some operation on each group. The function grpstats is specifically for calculating statistics of groups from a table (e.g. mean, std etc etc..), it's quite flexible.
You can read about those function sin the documentation. If you descibe what your specific aim is here, then you could get some concrete advice.
billyjthorton
on 7 Oct 2018
Edited: billyjthorton
on 7 Oct 2018
Walter Roberson
on 7 Oct 2018
Can you attach a sample dataset?
billyjthorton
on 7 Oct 2018
Edited: billyjthorton
on 7 Oct 2018
jonas
on 7 Oct 2018
It seems like you could use either grpstats or findgroups with a custom function for your least square estimate. Actually they are very much alike.
You can try working with arrays instead, e.g.:
grpstats(gpsdata.var,gpsdata.set,'mean')
will output the set-mean of the specific variable. The statistic (third argument) can be a custom function.
billyjthorton
on 7 Oct 2018
grpstats([T.X_coordinate,T.Y_coordinate],T.SetNumber,'mean')
Will give you the mean coordinates of the sets. The columns included in the first argument must be numeric or logical. The function also accept table inputs, which is sometimes cleaner. If you input a table, then the output is also a table.
grpstats(T(:,1:4),'SetNumber','mean')
I suspect you want to use a custom function. Is the equation you'd want to employ different or consistent for each set?
billyjthorton
on 8 Oct 2018
Edited: billyjthorton
on 8 Oct 2018
SetNumber is the name of the variable that you want to use as grouping variable. The syntax is as follows>
grpstats(x,groupingvariable,stats)
how you specify the groupingvariable depends upon the class of x. If x is a table, then groupingvaraible is a variable name. When importing with readtable, "SetNumber" was assigned automatically as variablename to my 1st column, so that's where that came from. You can get a list of the variable names like this (T is your table):
T.Properties.VariableNames
If x is instead a matrix, then groupingvariables is also a matrix (or vector), with the same amount of rows as the matrix.
Either way, x cannot include non-numerical columns, and will return an error if your table includes a column of strings. You can take a subset of your table by using braces, e.g. T(:,1:5), will pass the first 5 columns. This subset must include the groupingvariable and should not include any column that is non-numeric, because how do you calculate the mean of a string?
Going to get some sleep, will check in tomorrow morning. Cheers.
billyjthorton
on 8 Oct 2018
Edited: billyjthorton
on 8 Oct 2018
grpstats(gpsdata(1:110,1:17), gpsdata.Var1)
"Unrecognized variable name 'Set Number'"
There is no string called 'Set Number' in that line of code. Also, Set Number (with a blank) is not a valid variable name, so I'm not surprised it did not work anyway. Check your variable names in MATLAB, as readtable parse them when reading from excel.
"As you can see in the attached test data, that string is present. However, even if I shift it and do:"
grpstats(gpsdata(2:110,1:17), gpsdata.Var1)
The above is the exact same as the previous line. I don't understand what you tried.
"Even if I take only the numeric data, copy it into another spreadsheet and try to process it with something like:"
grpstats(gpsdata(2:108),1:17), gpsdata.x1)
Something wrong with this line of code. It makes no sense, the braces are placed incorrectly.
Regarding the find function, there are indeed many ways to group and work with your data. The find function (or logical indexing) could be useful would you choose to move forward with a for loop. I suggested the above approaches as they are quite effecient and elegant ways to deal with grouped data, without using a loop. If you have experience with for loops, then perhaps that is the way to go. I do not understand why you'd want to find strings, as your grouping variable is a double.
Unless you give a more detailed account of how you want to process the groups, then I do not know what more advice to give you. As of right now, we are stuck on issues related to syntax and not the actual problem in question.
billyjthorton
on 8 Oct 2018
Edited: billyjthorton
on 8 Oct 2018
jonas
on 8 Oct 2018
I'll write a more elaborate response in a bit. For now, just two comments. First:
grpstats(gpsdata(2:108,1:17), gpsdata.x1)
Wrong syntax. If you insert a table, then the grouping variable arg should be a string describing the variable name. If you want the column gpsdata.x1 as your grouping variable, then the correct syntax is:
grpstats(gpsdata(:,1:17), 'x1')
This should work as long as each of the 17 first columns contain numerical data only, and one variable is named x1.
Second, I assumed you had imported the table correctly but realized now that this is not the case. Follow Guillaumes advice. If it still does not work, then use the 'range' argument of readtable to skip the first line of the sheet. As for now, all your variables are imported as strings, as the first row in your table is a string. Note that it doesn't matter that you write:
gpsdata(2:108,1:17)
each column is still interpreted as strings.
Answers (2)
Guillaume
on 8 Oct 2018
It's unclear what problem you are actually facing with Jonas' examples. As Jonas says, you can easily use any of the grouping functions that work with tables. Not having the stats toolbox, I use varfun and rowfun. Here is a rowfun example to works with your attached excel file. I use it to calculate the mean 3D distance (from origin) of the satellites in each group. Adapt as required:
%function to use with rowfun. Defined as anonymous function here. Could be an m file instead
%takes 3 column vector inputs, x,y and z coordinates. Return a scalar
func = @(x, y, z) mean(sqrt(x.^2 + y.^2 + z.^2));
%import file
opts = detectImportOptions('GPS_data.xlsx'); %works better at figuring out that the header is on row 2
satdata = readtable('GPS_data.xlsx', opts);
%apply function to each set of satellites
rowfun(func, satdata, 'GroupingVariables', 'SetNumber', 'InputVariables', {'X_coordinate', 'Y_coordinate', 'Z_coordinate'}, 'OutputVariableNames', 'distance')
8 Comments
billyjthorton
on 8 Oct 2018
Guillaume
on 8 Oct 2018
I cannot help with grpstats. As I said, I don't have the required toolbox and have never used the function.
"what the difference is between GroupingVariables and SetNumber" . 'GoupingVariables' is the name of the parameter as documented, 'SetNumber' is the actual value of the parameter and is the name of the variable you want to use for grouping.
Would you be able to do a practical example
The code I've given with my answer works perfectly well with the example file you've provided. How more practical do you want? have you tried it?
billyjthorton
on 8 Oct 2018
Guillaume
on 8 Oct 2018
You need to pass a function handle to rowfun, so @func, not func (in my example, func was already a function handle to an anonymous function).
You're still going to get an error since your function defines an output that gets nothing assigned to it and since disp doesn't take 3 arguments.
I'm not sure what you're trying to do with your function. It needs to return something. If you want to know what x, y, z are going to contain, it's simple. The function will be called once for each group (defined by unique values in the GroupingVariables SetNumber). Each time x will be the column vector of X_coordinates matching the set number, y the column vector of Y_coordinates, etc.
At this point, it would be simpler if you told us exactly what operation you want to perform on your table.
billyjthorton
on 8 Oct 2018
Just to clarify.
SV1-6 are x,y,z-coordinates from 6 different satellites? If that's so, then you will have to learn to not use indexed variable names (which is never a good idea), because you are about to have a variable number of coordinate triplets.
Other than that it doesn't appear to be too much work. You should clean up the script, make it into a function, and then apply it in your grouping operation. Perhaps you can edit the script and indicate the values that will change when you vary the number of input triplets.
billyjthorton
on 9 Oct 2018
Edited: billyjthorton
on 9 Oct 2018
Guillaume
on 9 Oct 2018
I'm sorry but Ew! One rule that you need to put into practice immediately: numbered variables are always the wrong approach. sv1 should be sv(1), sv{1} or sv(1, :) or something similar. And every time you're repeating more or less the same lines of code, you're doing it wrong. Computers are very good at repeating things, let them do the work.
Your script, written in proper matlab:
% Given measurements
x_true = [1132049, -4903445, 3905453, 85000]; % meters
x_init = [0, 0, 0, 0];
% SV locations
sv = [15764733, -1592675, 21244655
6057534, -17186958, 19396689
4436748, -25771174, 1546041
-9701586, -19687467, 15359118
23617496, -11899369, 1492340
14540070, -12201965, 18352632];
% x_old values
num_it = 5;
x_old = zeros(num_it, 4);
% sigma
sd=5;
% Generate Measurements
rho = sqrt(sum((sv - x_true(1:3)) .^ 2, 2)) + x_true(4) + sd*randn(1);
% Estimation
for i = 1:num_it
rho_hat = sqrt(sum((sv - x_init(1:3)) .^ 2, 2)) + x_init(4);
% partial derivatives
h = [-(sv - x_init(1:3)) ./ (rho_hat - x_init(4)), ones(size(sv, 1), 1)];
% x_init - x_hat
dy = rho - rho_hat;
%x_hat = x_init + inv(H.'*inv(R)*H)*H.'*inv(R)*(y_1 - h)
dx = inv(h'*inv(sd^2)*h)*h'*inv(sd^2)*dy;
x_init = x_init + dx.';
x_old(i, :) = x_init;
end
Also note the warnings in the editor. I'm fairly certain you shouldn't be using inv but that's outside of my domain of expertise.
What I'm missing is what that script has to do with the table. What are the inputs and outputs of that scripts and how do they match the table?
Method 1 - grpstats
For the sake of simplicity, let's use only the first 5 columns in this example.
%%Import data
opts = detectImportOptions('GPS_data.xlsx');
T = readtable('GPS_data.xlsx', opts);
T = T(:,1:5);
T =
109×5 table
SetNumber PRN X_coordinate Y_coordinate Z_coordinate
_________ ___ ____________ ____________ ____________
1 17 2.4716e+06 -1.4705e+07 2.237e+07
1 6 -5.2026e+06 -2.5099e+07 6.9419e+06
...
Continuing...
%%Split and apply custom function
grpstats(T,'SetNumber',@func);
function out=func(x)
out=mean(x);
end
You can write your own function, just remember that the input, x, is always a column vector containing one group and a single variable. This is of course limiting, because you cannot perform complex operations on multiple variables in your function, or at least I do not know how. For this, it is better to use another method.
Method 2 - findgroups/splitapply
Read the data just like before, but transform the table into an array.
A=table2array(T);
Now, split the dataset in groups and perform some operation with all 4 variables (excluding Set Number) at once.
%%First column is grouping set
G = findgroups(A(:,1));
out1 = splitapply(@func,A(:,2:end),G);
function out1=func(vars)
vars
%%perform some calculations here
end
You can see that I have written out the variables passed to the custom function. What is nice here is that you can process all four variables in the same function. You can pass multiple outputs back to the main script if you so desire, however each output should be scalar.
vars =
1.0e+07 *
0.0000 0.2501 -1.4700 2.2370
0.0000 -0.5196 -2.5091 0.6975
...
I've never used rowfun personally but I suppose it's very similar to the second method.
8 Comments
I've never used rowfun personally but I suppose it's very similar to the second method
Yes, it's just simpler as I've shown in my answer! ?
With your particular example, you'd have to use varfun instead:
varfun(@func, T, 'GroupingVariables', 'SetNumber')
It's actually much closer to your first method.
billyjthorton
on 8 Oct 2018
Well, I suppose you want to write an actual function before running the script. The error message says that the function output (called out1) is not assigned any value, which is true because you did not specify any equation. There is even a comment in the function that says
%%perform some calculations here
Write something here, like
function out1=func(vars)
xcoord = vars(:,2)
ycoord = vars(:,3)
zcoord = vars(:,4)
out1 = xcoord+ycoord+zcoord;
end
Whatever the equation, it will be performed groupwise.
jonas
on 8 Oct 2018
@Guillaume: Awesome. I will probably have to switch over to rowfun and varfun instead. If nothing else, it's one line of code instead of two (findgroups/splitapply) :)
billyjthorton
on 8 Oct 2018
I know this is a lot to take in. If you're just starting to learn matlab, this may be a few levels above what you should be doing (such as learning about different classes, basic operations and loops). I hope you don't feel discouraged.
As I wrote above:
" You can pass multiple outputs back to the main script if you so desire, however each output should be scalar."
You cannot pass a matrix as output. Each output has to be scalar (a single value). You can pass multiple outputs though, in which case you just add
function [out1,out2,out3...]=func(vars)
%%some calculations
out1=1;
out2=2;
out3=3;
end
will pass three outputs back to the main script, all of which have to be scalar. Maybe you could solve this by passing cell arrays as output, but that would certainly only complicate things for you.
To summarize, do all your calculations for each group inside of the function, then pass the results to the main script for futher analysis.
billyjthorton
on 9 Oct 2018
Categories
Find more on Data Import from MATLAB in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!