concatenating and comparing two datsets

2 views (last 30 days)
Hi,
I have two datasets in excel of cell arrays which have columns containing charaters and numbers. The first dataset has 300 rows and 10 cols, and the second dataset 250 rows and 9 columns.
Column 10th and 9th of dataset1 and datset 2 are numbers , while all other columns are characters.
  1. for the first dataset I want to concatenate the first 8 columns.
  2. For the second data sets I want to do the same as step.1
  3. If the concatenated description of 1st dataset is same as second then I want to add 10th column of first data to the 9th colmn of send datasets.
I am new here but is there a method in matlab to do is?. I was doing this in excel in the past and it has been time-consuming and I was making erros doing this manually.
Any help greatly appreciated.
Thanks.
SSR

Accepted Answer

Jon
Jon on 16 Aug 2019
Edited: Jon on 16 Aug 2019
Assuming that the 9th column of both data sets is the same, I think you can do this just with an innerjoin
tbl1 = readtable('data1.xlsx')
tbl2 = readtable('data2.xlsx')
% join them to make third table which adds additional column to
% second table where character columns match
tbl3 = innerjoin(tbl1,tbl2)
There are probably some use cases that Guillaume is covering in his answer that may be important, but maybe this simple approach works for what you are doing.
  21 Comments
Jon
Jon on 20 Aug 2019
Sorry Guilaaume. I read your response too quickly, and just saw the assert, and didn't notice that the later code handles multiple matching rows.
Guillaumes approach is much cleaner than my suggestion of putting it in a loop. Definitely do it that way!
I think you should be all set now.
Sheetal Shirsat
Sheetal Shirsat on 21 Aug 2019
This is Perfect !.
Thank you so very much Guillaume and Jon for your time and help.. ! Much appreciated !.
Best regards,
Shital

Sign in to comment.

More Answers (1)

Guillaume
Guillaume on 16 Aug 2019
Edited: Guillaume on 16 Aug 2019
Something like this should work:
dataset1 = readtable('C:\somewhere\your1stexcelfile.xlsx'); %may need extra options, depending on the excel file
dataset2 = readtable('C:\somewhere\your2ndexcelfile.xlsx'); %may need extra options, depending on the excel file
assert(width(dataset1) == 10 & width(dataset2) == 9, 'Dataset content does not match your description')
dataset1 = [rowfun(@(varargin) strjoin(varargin, ''), dataset1, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset1(:, 9:10)];
dataset2 = [rowfun(@(varargin) strjoin(varargin, ''), dataset2, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset2(:, 9)];
joined = outerjoin(dataset1, dataset2, 'Keys', 1, 'MergeKeys', true, 'Type', 'right')
edit: rereading the description, it sounds like a right outerloin, not a left one
  3 Comments
Guillaume
Guillaume on 16 Aug 2019
Edited: Guillaume on 16 Aug 2019
Yes, I make sure that dataset1 has indeed 10 columns and dataset2 has indeed 9 columns as you've described. Clearly, if matlab throws the error, this is not the case. The number of rows is unimportant for the code, so I never check that.
As I said, it is trivial to convert cell arrays into table. At this point, you can use the join functions that do all the hard work for you:
%inputs
%raw1: a MxN cell array
%raw2: a Px(N-1) cell array
%note that columns 1:N-2 are used as keys in order to add column N of raw1 as new column N in raw2
traw1 = cell2table(raw1);
traw2 = cell2table(raw2);
tmerged = outerjoin(traw1, traw2, 'Keys', 1:size(raw1, 2)-2, 'MergeKeys', true, 'Type', 'right');
%if result is desired as a cell array
merged = table2cell(merged);
The same can be achieved the old fashioned way, with ismember indeed. However, you can't use ismember will 2D cell arrays of char vectors, so you'd have to merge each row of text as you initially requested, or assign a unique numeric id to each char vector. I'm choosing the latter option here:
%generation of numeric id for each cell array
uniquetext = unique([raw1(:, 1:end-2); raw2(:, 1:end-1)]);
[~, raw1key] = ismember(raw1, uniquetext);
[~, raw2key] = ismember(raw2, uniquetext);
%now we can use ismember to find which keys are present in both sets
[found, where] = ismember(raw2key, raw1key, 'rows');
merged = [raw2, num2cell(NaN(size(raw2, 1), 1))]; %prepare destination by adding a column of NaN
merged(found, end) = raw2(where(found), end); %and copy relevant elements
Guillaume
Guillaume on 19 Aug 2019
Edited: Guillaume on 19 Aug 2019
I tried Guillaume's method and its not working either because, the unique function he uses in his example isn't working for my datasets
"doesn't work" is a useless statement if you don't provide more details. What happens? Or doesn't happen? What if the full error message if there is one?
We're guessing what your inputs are and keep guessing wrong as you don't give us the full information. A simple way to resolve that and get an answer that works for you is to attach a mat file with example inputs.

Sign in to comment.

Categories

Find more on Matrices and Arrays in Help Center and File Exchange

Tags

Products


Release

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!