Run statistical tests on multiple csvs

I have lots of days worth of heartrate data stored in seperate csvs and im currently running a ttest on the data 3hrs before and after 12am. I was going to run this manually on all 30 differenent days but i was wondering if there was a way of looping through all the different days to return all the p values at once.

1 Comment

That seems very likely. Once you create file list (e.g. with dir) you should be able to do the processing in a loop.
Do you have a specific question about implementing this?

Sign in to comment.

Answers (1)

Manash Sahoo
Manash Sahoo on 20 Apr 2021
Edited: Manash Sahoo on 20 Apr 2021
Store your data in a folder, and use the "Dir" command to return the filenames and loop through them.
For example:
% Load your heart rate data. You can get the names of files and folders
% using the "dir" command.
files = dir(strcat(filepath,"\*.csv")) % Filepath would be the path where your csvs are located.
pval = {};
for i = 1:length(files)
HRDat = readmatrix(files.name); % You may need to edit this per your filepath.
% Do your analysis here, and return your pvalue to pval{i}.
end
Your pvalues in the cell array "pval" will thus correspond to the files in the struct array "files.name". This is usually the way I do things with heart rate data. Let me know if you have any further questions!
EDIT: Fixed the code.
MS

7 Comments

  • Pre-allocation tends to be faster, and since a p-value is a numeric value, using a double array is probably fine as well.
  • You're using the length function. Consider using numel or size instead.
  • I personally prefer avoiding i as a variable name, so I changed that to n.
  • Try to avoid using strcat to create a path. Using fullfile allows you that same flexibility, without having to wory about the correct filesep.
  • As a last point: you forgot to index the struct inside the loop.
% Load your heart rate data. You can get the names of files and folders
% using the "dir" command.
files = dir(fullfile(filepath,'*.csv')) % Filepath would be the path where your csvs are located.
pval = NaN(numel(files),1);
for n = 1:numel(files)
HRDat = readmatrix(files(n).name); % You may need to edit this per your filepath.
% Do your analysis here, and return your pvalue to pval(n).
end
Ah. Thanks for the pointers! This is indeed a much better solution.
Ive got it to run however its returning the same p value for all 24 days. Any ideas why this may be?
files = dir(fullfile('/Users/rossthompson/Documents/MATLAB/HR_data','*.csv'));
pval = NaN(numel(files),1);
for n = 1:numel(files);
HRDat = readmatrix(files(n).name);
noonindex = find(data.noonTime==1);
noontime = data.Timestamp(noonindex);
A = data.HeartRate(noonindex-36:noonindex);
B = data.HeartRate(noonindex+1:noonindex+37);
[h, p, ci] = ttest2(A,B);
pval(n) = p;
end
pval
pval =
1.0e-11 *
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
0.4139
That is a very low value: 0.4e-11. You could conclude that all your analyses have a p value of 0.
Otherwise you will have to look at the data you're using each iteration. When you do that, you will notice that you aren't actually using HRDat in the rest of your loop, so each iteration is using the exact same data.
Silly me! Thanks

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Tags

Asked:

on 20 Apr 2021

Commented:

on 20 Apr 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!