How to search for channel name and numerical data in resulting struct after importing multiple data files?

Question

Scooby921 on 4 Apr 2019

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/454501-how-to-search-for-channel-name-and-numerical-data-in-resulting-struct-after-importing-multiple-data

Edited: Scooby921 on 13 May 2019

Questions:

1.) In avoiding using eval to dynamically name variables, how do I search a resulting struct of data and labels to link a channel name to a data column and then analyze multiple channels all having the same name?

2.) How do I properly write an if or switch case statement to deal with importing a single or multiple data files when the resulting workspace object is either a character array for a single file or a cell array for multiple files?

Background:

Currently using Matlab R2014b. I'm trying to write a script to select which data files, import / load the data, and place the data into a matrix or array or struct or whatever is most useful and appropriate for signal analysis, processing, and plotting afterward.

My data files are an export from a data acquisition tool (ATI VISION). The generated .mat file creates one cell array and one matrix. The cell array contains the text names of the data channels. This is n x 1 in size, where n is the number of channels exported. The matrix contains the numerical data and is m x n in size, where n is again the number of exported data channels and m is the number of samples.

The cell array of names has nearly zero consistency in the organization of the names (not alphabetical, not any order representing a channel number in the recording tool). The only consistency is that the nth row in the cell array shows no name "[]" but is always the "time" channel, and this always corresponds to column 1 in the data matrix. I've attached two mat files for reference. You can see in one file the first three rows are 'AngleSlipPoint2', 'AngleSlipPoint1', and 'AngleSlip', but in the other file the first three rows are 'PosLon', 'FRSpeed', and 'AccActPos'. I know my script can't be as easy as always accessing column 2 for x data and column 6 for y data. I need to search the cell array to link a name to a column in each individual data file.

In the two weeks that I've now been teaching myself how to write scripts and analyze data with Matlab I've apparently learned to do things the ill-advised way. The first obstacle I tried to address was aligning the data name in the cell array with the appropriate column in the data matrix. Because the cell array appears to be an array of text and not characters or strings I could find no other way to pull out the names, link them to data, and generate a variable than to use the frequently unrecommended eval function.

%% Extract Data
uiload
NumVars = numel(Data_Labels); % Establish number of variables to be created.
Time = Data(:,1); % Time values are always the first column of the Data matrix, so it's easy to define and create.
for k = 1:NumVars-1 % Time already created and exists as final channel, so we only need to generate variables for the remaining n-1 variables.
    eval([Data_Labels{k},'=[Data(:,k+1)]']); % Extract variable names and populate with data in workspace.
end

This worked for a single data file as it gets me a workspace full of variables and correctly populates them with the numerical data. I can integrate, derive, filter, and plot whatever I want. It fails miserably as soon as I attempt to load a second file as the next import will overwrite everything created from the first file. Hard to compare longitudinal acceleration in 2wd and 4wd when the newest import overwrites the old, and it's understandably stupid to write the script to append a 1/2/3 to the end of the name so I can have multiple instances in the workspace.

This is what I've come up with for importing multiple files. Still using the two attached files as my test files for writing the script.

%% Select files for 2WD analysis
[selected2wdFiles,pathName2wd] = uigetfile('*.mat','Select 2WD data files for analysis','MultiSelect','on');
if isequal(selected2wdFiles, 0)
   disp('No Files Selected')
   return;
end
for m = 1:length(selected2wdFiles)
  data2wd(m) = load(fullfile(pathName2wd, selected2wdFiles{m}));
end
%% Select files for 4WD analysis
[selected4wdFiles,pathName4wd] = uigetfile('*.mat','Select 4WD data files for analysis','MultiSelect','on');
if isequal(selected4wdFiles, 0)
   disp('No Files Selected')
   return;
end
for n = 1:length(selected4wdFiles)
  data4wd(n) = load(fullfile(pathName4wd, selected4wdFiles{n}));
end

This generates two structs, data2wd and data4wd, which contain the loaded cell arrays and data matrices. Unfortunately this script only works if I am selecting multiple files. If I only select one file it fails because the resulting item is a character array instead of a cell array. I haven't tried to script around that, but I suppose a switch case or if statement should work. Question #2 above...any suggestions?

The next step / steps is where I am lost. I believe I have avoided dynamically named variables, but I don't know how to go about extracting my longitudinal acceleration data from each data set. The specific channel name in the cell array of text is going to be 'AccelForward'. I know I need to search the cell array in row 1, column 2 of the struct to find the row number containing that name. This will tell me which column to access in the matrix stored in row 1, column 1 of the struct. Because it is a cell array of text the strfind command doesn't work. They aren't strings. Similarly they aren't characters either, so the related char commands don't work. Without using eval to extract things, how do I go about searching an array of text?

Once I can find the name, identify the data column, and the locate the actual data, how do I manipulate it without falling back on dynamically named workspace variables? I feel like I'm going to end up with pulling these columns of data back into the workspace as AccelForward_1, AccelForward_2, etc. and then more complicated and dynamic because I will have 2wd and 4wd data being compared and plotted against eacy other. What's the correct way to identify the data, manipulate the data, store the new data, and then access it later for plotting? Do I just keep generating more structs or arrays or matrices to stuff the data into and avoid a ridiculous workspace full of variables?

Now that I'm done writing a novel I suppose I simply don't know what I don't know and it makes it difficult to search and find answers. If anyone can put some labels on the forks in the road and send me in a useful direction I'd appreciate it. Thank you.

1 Comment
Show -1 older commentsHide -1 older comments

Stephen23 on 5 Apr 2019

"The only consistency is that the nth row in the cell array shows no name "[]" but is always the "time" channel, and this always corresponds to column 1 in the data matrix."

Ouch!

Sign in to comment.

Sign in to answer this question.

Answer 1

Stephen23 on 5 Apr 2019

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/454501-how-to-search-for-channel-name-and-numerical-data-in-resulting-struct-after-importing-multiple-data#answer_369192

Edited: Stephen23 on 5 Apr 2019

Open in MATLAB Online

You are right to avoid dynamically accessing variable names (e.g. using eval, assignin, evalin, and load without an output variable). Read this to know some of the reasons why:

https://www.mathworks.com/matlabcentral/answers/304528-tutorial-why-variables-should-not-be-named-dynamically-eval

Here is one simple solution for your task, using a non-scalar structure and dynamic fieldnames:

https://www.mathworks.com/help/matlab/matlab_prog/access-multiple-elements-of-a-nonscalar-struct-array.html

https://www.mathworks.com/help/matlab/matlab_prog/generate-field-names-from-variables.html

Using structure fields makes the order of the columns in the numeric matrix totally irrelevant.

[F,P] = uigetfile('*.mat','2WD','MultiSelect','on');
if isnumeric(F)
    error('User quit')
elseif ischar(F)
    F = {F};
end
S = struct('filename',F);
for ii = 1:numel(F)
    T = load(fullfile(P,F{ii}));
    L = [{'Time'};T.Data_Labels(1:end-1)]; % fix "Time" column mismatch
    for jj = 1:numel(L)
        S(ii).(L{jj}) = T.Data(:,jj);
    end
end

The imported data is very easy to access in the structure, you only need to refer to the indices (corresponding to each file) and the fieldnames (corresponding to each data column), e.g:

>> S(1).filename
ans =
MKZ_2WD_LevelSnowAccel.mat
>> S(1).AccelForward([1:4,end-4:end])
ans =
        -0.18
        -0.18
        -0.18
        -0.18
        ... lots of lines
        -1.92
        -1.72
         -1.2
        -2.24
         -4.1
>> S(1).Time([1:4,end-4:end])
ans =
      -5.1505
      -5.1405
      -5.1305
      -5.1205
      ... lots of lines
        23.01
        23.02
        23.03
        23.04
        23.05
>> S(2).filename
ans =
MKZ_4WD_LevelSnowAccel.mat
>> S(2).AccelForward([1:4,end-4:end])
ans =
        -0.07
        -0.07
        -0.07
        -0.07
        ... lots of lines
         0.23
         0.19
          0.3
         0.33
         0.27
>> S(2).Time([1:4,end-4:end])
ans =
      -5.3711
      -5.3611
      -5.3511
      -5.3411
      ... lots of lines
       24.069
       24.079
       24.089
       24.099
       24.109
 

You could also do something similar with tables, timetables, or by rearranging the columns of the numeric array to have the same order.

2 Comments
Show NoneHide None

Scooby921 on 5 Apr 2019

Thank you very much. I'll play around with this today!

Scooby921 on 13 May 2019

Edited: Scooby921 on 13 May 2019

Open in MATLAB Online

As a follow-up a month later...thank you again! Worked with it a bit and learned a good deal more about working with structs. Wound up extended the script to include calling data from these initially generated structs, deriving acceleration from velocity, appending a lost data point, creating and applying filters, and loading everything back into a new struct of filtered data.

Just in case anyone looks up this question / answer and wants to see my end-result. Added notes at the end for colleagues who might use this script and may not fully understand what I've done.

%% Initialize
close all
clear
clc
%% Select and load 2wd data files into struct
[F2,P2] = uigetfile('*.mat','Select 2WD Data Files','MultiSelect','on');
if isnumeric(F2)
    error('User quit')
elseif ischar(F2)
    F2 = {F2};
end
D2 = struct('filename',F2);
for ii = 1:numel(F2)
    Tmp2 = load(fullfile(P2,F2{ii}));
    L2 = [{'Time'};Tmp2.Data_Labels(1:end-1)]; % fix "Time" column mismatch
    for jj = 1:numel(L2)
        D2(ii).(L2{jj}) = Tmp2.Data(:,jj);
    end
end
clearvars ii jj Tmp2 L2
%% Select and load 4wd data files into struct
[F4,P4] = uigetfile('*.mat','Select 4WD Data Files','MultiSelect','on');
if isnumeric(F4)
    error('User quit')
elseif ischar(F4)
    F4 = {F4};
end
D4 = struct('filename',F4);
for kk = 1:numel(F4)
    Tmp4 = load(fullfile(P4,F4{kk}));
    L4 = [{'Time'};Tmp4.Data_Labels(1:end-1)]; % fix "Time" column mismatch
    for nn = 1:numel(L4)
        D4(kk).(L4{nn}) = Tmp4.Data(:,nn);
    end
end
clearvars kk mm Tmp4 L4
%% Define inertial sensor filter
AccelFilt = designfilt('lowpassiir', 'PassbandFrequency', 5, 'StopbandFrequency', 25, 'PassbandRipple', 1, 'StopbandAttenuation', 40, 'SampleRate', 100, 'MatchExactly', 'passband');
%% Derive wheel accelerations from wheel speeds
WhlAcc2 = struct('filename',F2,'FLAcc',zeros,'FRAcc',zeros,'RLAcc',zeros,'RRAcc',zeros);
for qq = 1:numel(F2)
    WhlAcc2(qq).FLAcc = [diff(D2(qq).FLSpeed);0];
    WhlAcc2(qq).FRAcc = [diff(D2(qq).FRSpeed);0];
    WhlAcc2(qq).RLAcc = [diff(D2(qq).RLSpeed);0];
    WhlAcc2(qq).RRAcc = [diff(D2(qq).RRSpeed);0];
end
WhlAcc4 = struct('filename',F4,'FLAcc',zeros,'FRAcc',zeros,'RLAcc',zeros,'RRAcc',zeros);
for rr = 1:numel(F4)
    WhlAcc4(rr).FLAcc = [diff(D4(rr).FLSpeed);0];
    WhlAcc4(rr).FRAcc = [diff(D4(rr).FRSpeed);0];
    WhlAcc4(rr).RLAcc = [diff(D4(rr).RLSpeed);0];
    WhlAcc4(rr).RRAcc = [diff(D4(rr).RRSpeed);0];
end
clearvars qq rr
%% Filter data and load into new struct
D2f = struct('filename',F2,'AccelxF',zeros,'AccelyF',zeros,'YawRateF',zeros,'FLAccF',zeros,'FRAccF',zeros,'RLAccF',zeros,'RRAccF',zeros);
for nn = 1:numel(F2)
    D2f(nn).AccelxF = filtfilt(AccelFilt,D2(nn).AccelForward);
    D2f(nn).AccelyF = filtfilt(AccelFilt,D2(nn).AccelLateralCorr);
    D2f(nn).YawRateF = filtfilt(AccelFilt,D2(nn).AngRateZCorr);
    D2f(nn).FLAccF = filtfilt(AccelFilt,WhlAcc2(nn).FLAcc);
    D2f(nn).FRAccF = filtfilt(AccelFilt,WhlAcc2(nn).FRAcc);
    D2f(nn).RLAccF = filtfilt(AccelFilt,WhlAcc2(nn).RLAcc);
    D2f(nn).RRAccF = filtfilt(AccelFilt,WhlAcc2(nn).RRAcc);
end
D4f = struct('filename',F4,'AccelxF',zeros,'AccelyF',zeros,'YawRateF',zeros,'FLAccF',zeros,'FRAccF',zeros,'RLAccF',zeros,'RRAccF',zeros);
for pp = 1:numel(F4)
    D4f(pp).AccelxF = filtfilt(AccelFilt,D4(pp).AccelForward);
    D4f(pp).AccelyF = filtfilt(AccelFilt,D4(pp).AccelLateralCorr);
    D4f(pp).YawRateF = filtfilt(AccelFilt,D4(pp).AngRateZCorr);
    D4f(pp).FLAccF = filtfilt(AccelFilt,WhlAcc4(pp).FLAcc);
    D4f(pp).FRAccF = filtfilt(AccelFilt,WhlAcc4(pp).FRAcc);
    D4f(pp).RLAccF = filtfilt(AccelFilt,WhlAcc4(pp).RLAcc);
    D4f(pp).RRAccF = filtfilt(AccelFilt,WhlAcc4(pp).RRAcc);
end
clearvars nn pp
%% Note
% At this point all 2wd data is loaded into a struct named "D2" 
% and all 4wd data is loaded into a struct named "D4".
% Filtered 2wd data for accelerations is loaded into "D2f"
% and filtered 4wd data for accelerations is loaded into "D4f".
% All file names are loaded into the first column of those structs.
% To confirm which data file is in each row use the following syntax:
% D2(r).filename    where 'r' is the row in question
% D4(r).filename    where 'r' is the row in question
% Numerical data can be accessed by calling the struct, row, and name 
% of desired data.
% Example: Call steering wheel angle for first 2wd data file -->
% D2(1).SteeringWheelAngle
% Example: Call front right wheel speed for fifth 4wd data file -->
% D4(5).FRSpeed
% Example: Call filtered long accel for third 2wd data file -->
% D2f(3).AccelxF
% Plotting will use the same syntax for calling a variable -->
% plot(D2(2).Time,D2(2).SteeringWheelAngle)
%

Sign in to comment.

Answer 2

Guillaume on 5 Apr 2019

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/454501-how-to-search-for-channel-name-and-numerical-data-in-resulting-struct-after-importing-multiple-data#answer_369205

Open in MATLAB Online

Considering that one of the variable is time, you may be better off storing your data in a timetable rather than a structure

The principle would be the same, use the cell array of names to name the variables instead of fields.

I'm a bit confused about one thing. If the time is the first column of the matrix, why is it the last element of the cell array. Is the array of name reversed with regards to the data column or does Data_Label(1:end-1) correspond to Data(2:end)?

I'm assuming the time is in seconds:

filepath = 'MKZ_2WD_LevelSnowAccel.mat';  %obtained however you want, with uigetfile for eg.
filecontent = load(filepath);
signals = array2timetable(filecontent.Data(:, 2:end), 'RowTimes', seconds(filecontent.Data(:, 1)), 'VariableNames', filecontent.Data_Labels(1:end-1));

If you want to import multiple files, you can store each timetable in a cell array, or vertically concatenate them into one big timetable. For that, I'd add a column indicating which source file each row came from. The order of the variables in a table does not have to be the same when you vertically concatenate tables, so the mismatched ordering wouldn't be an issue.

6 Comments
Show 4 older commentsHide 4 older comments

Scooby921 on 5 Apr 2019

The issue with the array of names is that it's random. I did mention that somewhere in my original wall of text. Understandable if you missed it...wall of text :o. It drops time as column 1 in the data matrix and the name as a blank in the last row of the cell array. I have a theory why (see below). For every named signal in the recorder it appears to process in a random order. I will send a request to that developer to see if they can at least update the export feature to do things alphabetically.

I think time is the first column in the data matrix because I have the settings configured to export all channels on the same fixed time step, so I have an equal number of data points for every signal. The tool is creating and populating the time channel / column first to define the total number of rows of data based on the time-length of the data file and the resolution I've chosen (in this case 10ms / 100Hz). This way the tool knows how many blank cells need to be filled with "last value" for any signal recorded slower than 100Hz.

The time channel having no name and ending up last in the cell array is likely because time is not a channel specified in my recorder. It's a default feature of each piece of data and linked to the x-axis of that signal, but there is no specific channel named "time" in my plotter which is getting exported. Thus after exporting all of the other actual named signals the tool accounts for there being a time channel and drops a character into a final row just to make the number of rows in the cell array match the number of columns in the data matrix.

For what it's worth I do have the ability to export all data as Matlab structs, and this gives me name.signals.values and name.time for my x and y axes for plotting. Unfortunately I end up with things have mis-matched numbers of data points, or a few signals having one more or one less data point based on when the recorder started and stopped and when that CAN signal last updated. Using the signal processing toolbox to resample data imparts undesired noise in the automatic filter that function applies.

Scooby921 on 5 Apr 2019

Open in MATLAB Online

Misunderstood your question. Yes the columns and names do match, just offset by 1 due to the time data being column 1 yet row n in the array of names.

So the name in row 1 of the cell array is column k+1 in the data matrix. That is what I was accounting for in my first script above with the eval command. The first line accounts for not needing to process the last, nameless row of the cell array. The "k+1" in the following line accounts for the shift due to time being the first column.

for k = 1:NumVars-1 % Time already created and exists as final channel, so we only need to generate variables for the remaining n-1 variables.
    eval([Data_Labels{k},'=[Data(:,k+1)]']); % Extract variable names and populate with data in workspace.
end

Obviously the end-game is to not use eval, but however else I do this with tables, structs, matrices, or arrays, I should still be able to search the cell array to get a name, and the row number containing that name +1 identifies the column in the data matrix containing the numerical values. Looking for options to search the array inside the struct or to better import the data files for ease of access without the dynamic naming.

Scooby921 on 5 Apr 2019

Since there seem to be concerns with releases and available features...I started using R2014b because that's what we use for Simulink modeling and are stuck on that release for the moment to maintain model / s-function compatibility with customers. With my data analysis likely being a stand-alone function that I or other team members are going to run separate from model development I shouldn't have a problem upgrading to R2018b or 19a. If that opens up more options and makes life easier I'll go ahead and do that. Wasn't an initial thought or concern simply because I already had a version of Matlab loaded and working on my computer.

Guillaume on 5 Apr 2019

Open in MATLAB Online

the original question mentions "Currently using Matlab R2014b..."

That, I did indeed miss in the wall of text (and the fact that the Release was tagged, I should have looked at that).

Yes the columns and names do match, just offset by 1 due to the time data being column 1 yet row n in the array of names

Then, both answers account for that. The timetable or structure use the names in whichever order they come to name the matching column.

Neither timetables or structures care about the ordering of the fields/variables when you operate on them (well as long as you are using the names and not numeric indices), so it does not matter if they're not in the same order from file to file.

%tables work the same way as timetables
t1 = array2table(rand(10, 3), 'VariableNames', {'Speed', 'Slip', 'Pitch'})
t1.Slip   %will return the 2nd column of the table
t2 = arrat2table(rand(10, 3), 'VariableNames', {'Pitch', 'Speed', 'Slip'})
t2.Slip  %will return the 3rd column of the table

Sign in to comment.

How to search for channel name and numerical data in resulting struct after importing multiple data files?

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (1)

6 Comments
Show 4 older commentsHide 4 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to search for channel name and numerical data in resulting struct after importing multiple data files?

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (1)

6 Comments Show 4 older commentsHide 4 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None

6 Comments
Show 4 older commentsHide 4 older comments