Combine datasets of unequal lengths by matching dates
7 views (last 30 days)
Show older comments
My question has 3 sub-questions: I have 10 matrices (10 different variables: V1 V2, V3…, V10) that each one has 2 or 3 columns and a few hundreds of rows (different size each one). The first column at every matrix is the date (serial date number), the second (when there) is the depth and the third is a measured value of a variable (temperature, chlorophyll, etc).
1. I want to create a new matrix that will have all variables combined based on first column (date), and the rest of the columns will be the values of the variables that match these dates. In fewer words, I need to concatenate all matrices by matching up the dates from their first column.
2. I need to do the same as in (1) but not using all the dates recorded. Instead use just the dates of a certain variable (for example 'V1') as the matching point and line up the variables according to the dates of ‘V1’.
3. Lastly, I need to combine the data from the variables that have 3 columns when the elements of the first two elements in each row are the same between the different variables. So, the desired matrix here will have the date and depth as the first 2 columns and then the rest of the columns will be the measurements of the different variables at that day and that depth. So I have to concatenate the matrices by matching the elements of the first 2 columns.
I have tried a few things for each case but nothing seemed to work as desired.
I would appreciate your input to any or all of my sub-questions.
Warm regards, Niki
0 Comments
Accepted Answer
Kirby Fears
on 17 Dec 2015
Edited: Kirby Fears
on 17 Dec 2015
niki,
You can do all of this quite easily if you first put your data into tables. Check out array2table documentation to convert your data into tables. The examples are pretty good.
Then you can use the various join operations (join, innerjoin, outerjoin) to obtain the merged tables you're looking for. You can find links to these functions from the tables link above. Pay close attention to the Name,Value pairs you can input to these functions. They will determine how joins are performed.
Hope this helps.
3 Comments
Kirby Fears
on 21 Dec 2015
Edited: Kirby Fears
on 21 Dec 2015
Tables have three join methods: join, innerjoin, and outerjoin join. Everything niki is trying to do can be done with one of these methods (along with method settings).
In your example, innerjoin would keep only serial date numbers in both tables (excluding 0 and 2) and join accordingly. On the other hand, outerjoin would include both 0 and 2.
Chris Turnes
on 25 Dec 2015
Ah! Good point. I forgot that outerjoin and innerjoin existed. That is a much cleaner solution.
More Answers (2)
Chris Turnes
on 17 Dec 2015
There's a little bit of ambiguity here, but this shouldn't be too hard. What happens when there is no corresponding date from V1 in V2? Do the variables from V2 just become NaN?
Assuming that's the case, I think you should look into the unique and ismember functions. Here's a small example of how to use them to do the tasks you want to accomplish:
rng('default')
% Some fake data -- first column is the date index
V1 = [(1:100)', randi(10, 100, 2)];
V2 = [(1:50)', randi(10, 50, 2)];
V3 = [(1:73)', randi(10, 73, 1)];
% Task 1: Join the data by all dates
% Find the set of unique serial dates
uniqueDates = unique([V1(:,1); V2(:,1); V3(:,1)]);
Vjoined = nan(length(uniqueDates), 6);
Vjoined(:, 1) = uniqueDates;
% Repeat this process for each array
[~, idx] = ismember(V1(:,1), uniqueDates);
Vjoined(idx, 2:3) = V1(idx, 2:3);
[~, idx] = ismember(V2(:,1), uniqueDates);
Vjoined(idx, 4:5) = V2(idx, 2:3);
[~, idx] = ismember(V3(:,1), uniqueDates);
Vjoined(idx, 6) = V3(idx, 2);
This gives you your method for 1. For 2., it's pretty much the same, except you don't have to compute all the unique values, you just use the first column of whichever variable you want to match:
% Task 2: Join by the date in V1
Vjoined = V1;
% Repeat this process for V2, V3
Vjoined = nan(size(V1, 1), 6);
Vjoined(:, 1:3) = V1;
[~, idx] = ismember(V2(:,1), Vjoined(:, 1));
Vjoined(idx, 4:5) = V2(idx, 2:3);
[~, idx] = ismember(V3(:,1), Vjoined(:, 1));
Vjoined(idx, 6) = V3(idx, 2);
For the last task, you can the 'rows' option of intersect to figure out the sets of first two columns shared by data with 3 columns. Then, you can use the 'rows' option of ismember (together with some careful use of the output) to join them:
% Task 3: Join V1 and V2 if the first two variables are the same
C = intersect(V1(:, 1:2), V2(:, 1:2), 'rows');
Vjoined = zeros(size(C, 1), 4);
Vjoined(:, 1:2) = C;
[~, idx] = ismember(V1(:, 1:2), C, 'rows');
Vjoined(nonzeros(idx), 3) = V1(logical(idx), 3);
[~, idx] = ismember(V2(:, 1:2), C, 'rows');
Vjoined(nonzeros(idx), 4) = V2(logical(idx), 3);
0 Comments
See Also
Categories
Find more on Time Series Objects in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!