How to remove outliers in a matrix, according to two different column entries?

3 views (last 30 days)
Hellow, I'm a bit novice in matlab coding. And i require your assistance.
I have a 3250x3 numeric matrix as depicted below and I want to identify and remove the latencies which falls outside the +-0.5 from the mean for each subject. Next, I want to average the latencies in the column 3 according to the trialcode (column2) for each subject (column1) and output as a matrix. Finally, I want to run a repeated measures ANOVA (2x2) according to the trial code.
I require assistance for the first two steps pimarily.
subject trialcode latency
8 4 340
8 4 328
8 3 218
8 4 338
8 3 213
8 4 328
8 3 254
8 4 323
8 4 340
8 3 273
9 3 580
9 4 363
9 4 371
9 3 374
9 3 383
9 3 302
9 4 406
9 3 390
9 3 380
9 3 366
9 4 468
I want to remove outliers for each subject across each trial code.
I tried the following codes which did not work :
[K, ~, G] = unique(Experiment1engS1(:, 1:2), 'rows')
mean= rmoutliers(K(:,3),'center','mean','ThresholdFactor', 2.5)
I also tried the for function:
Subject=[999];
% trialcode (1=mask_cong, 2=mask_incong, 3=nomask_cong, 4=nomask_incong)
trialcode = [999];
% Latency
latency = [999];
%calcolo delle medie
for i = 1:160:3250
%Calcolo medie
SUB_temp = mean(Experiment1engS1(i:i+159,1));
trialcode_temp = mean(Experiment1engS1(i:i+159,2));
latency_temp = rmoutlier(Experiment1engS1(i:i+159,3));
%scrivo nelle matrici
Subject=[Subject; SUB_temp];
trialcode = [trialcode; trialcode_temp];
latency = [latency; latency_temp];
end
This does not work, as some subjects don't have a total of 160 trials, as the data was pre processed to remove error trials.
I tried to use the splitapply, unique and rmoutlier, with no luck!
K= splitapply(@rmoutlier,Experiment1engS1(:,3),unique(Experiment1engS1(:, 1:2), 'rows'))
Kindly suggest what can be done. Thank you.

Answers (1)

Vidhi Agarwal
Vidhi Agarwal on 28 Nov 2024
The error you're encountering suggests that the groupsummary function is expecting a table or a dataset array, but it's receiving a standard numeric matrix instead. To resolve this issue, try converting matrix into a table format before using groupsummary.
To do this follow the given below steps:
  • Convert the filtered data matrix into a table format.
  • Apply "groupsummary" on the table.
The revised code for the same is given below:
% Sample data
data = [
8 4 340; 8 4 328; 8 3 218; 8 4 338; 8 3 213;
8 4 328; 8 3 254; 8 4 323; 8 4 340; 8 3 273;
9 3 580; 9 4 363; 9 4 371; 9 3 374; 9 3 383;
9 3 302; 9 4 406; 9 3 390; 9 3 380; 9 3 366;
9 4 468
];
% Extract columns
subjects = data(:, 1);
trialcodes = data(:, 2);
latencies = data(:, 3);
% Define a function to remove outliers
removeOutliers = @(latencies) latencies(abs(latencies - mean(latencies)) <= 0.5 * std(latencies));
% Group by subject and trialcode
[G, ~] = findgroups(subjects, trialcodes);
% Apply the function to each group
cleanedData = splitapply(@(latencies) {removeOutliers(latencies)}, latencies, G);
% Reconstruct the data matrix without outliers
filteredData = [];
for i = 1:length(cleanedData)
if ~isempty(cleanedData{i})
group = unique(data(G == i, 1:2), 'rows');
filteredData = [filteredData; repmat(group, size(cleanedData{i}, 1), 1), cleanedData{i}];
end
end
% Convert filtered data to a table
filteredTable = array2table(filteredData, 'VariableNames', {'Subject', 'TrialCode', 'Latency'});
% Calculate the mean latency for each subject and trialcode
averageLatencies = groupsummary(filteredTable, {'Subject', 'TrialCode'}, 'mean', 'Latency');
% Display the result
disp(averageLatencies);
To understand more about "groupsummary" refer to the following documentation:
  • https://www.mathworks.com/help/matlab/ref/double.groupsummary.html
Hope this helps!

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!