How to extract only numerical values from cell into new column array?

3 views (last 30 days)
For this data:
"chb12_06.edf"
"1665 seconds"
"1726 seconds"
"3415 seconds"
"3447 seconds"
"chb12_08.edf"
"1426 seconds"
"1439 seconds"
"1591 seconds"
"1614 seconds"
"1957 seconds"
"1977 seconds"
"2798 seconds"
"2824 seconds"
"chb12_09.edf"
"3082 seconds"
"3114 seconds"
"3503 seconds"
"3535 seconds"
"chb12_10.edf"
"593 seconds"
"625 seconds"
"811 seconds"
"856 seconds"
Whis is a column in a larger cell array. I want to extract just the seconds value after each edf file, and possibly put them in a new array or matrix, eg:
{"chb12_06.edf" "1665 seconds" "1726 seconds" "3415 seconds" "3447 seconds" ; "chb12_08.edf" "1426 seconds" "1439 seconds" "1591 seconds" "1614 seconds" "1957 seconds" "1977 seconds" "2798 seconds" "2824 seconds"...}
I hope this makes sense what I'm trying to achieve, if not please feel free to clarify, thanks!

Accepted Answer

Bruno Luong
Bruno Luong on 15 Jul 2019
s={"chb12_06.edf" "1665 seconds" "1726 seconds" "3415 seconds" "3447 seconds" "chb12_08.edf" "1426 seconds" "1439 seconds" "1591 seconds" "1614 seconds" "1957 seconds" "1977 seconds" "2798 seconds" "2824 seconds"};
sc = cellfun(@char,s,'Unif',0); %cast for regexp, which does not work on string input
sm = regexp(sc,'^\d* seconds','match');
[sm{:}]
  3 Comments
Bruno Luong
Bruno Luong on 15 Jul 2019
s={"chb12_06.edf" "1665 seconds" "1726 seconds" "3415 seconds" "3447 seconds" "chb12_08.edf" "1426 seconds" "1439 seconds" "1591 seconds" "1614 seconds" "1957 seconds" "1977 seconds" "2798 seconds" "2824 seconds"};
sc = cellfun(@char,s,'Unif',0); %cast for regexp, which does not work on string input
sm = regexp(sc,'^\d* seconds','match','once');
isfile = cellfun('isempty',sm);
g = cumsum(isfile);
num = str2double(regexp(sm(~isfile),'^\d*','match','once'));
numc = splitapply(@(x) {x(:)}, num, g(~isfile));
% Pad so the cell subarrays have same row-length
nrows = max(cellfun('size',numc,1));
numc = cellfun(@(x) [x; nan(nrows-size(x,1),1)], numc, 'unif', 0);
% construct the table
colnames = strrep(cellfun(@string,sc(isfile)),".","_"); % colnames cannot have "."
T = table(numc{:});
T.Properties.VariableNames = colnames
Z Liang
Z Liang on 16 Jul 2019
thank you, I was able to test each line and everything worked for me!
it's very clever the use of isfile followed by cumsum, and I haven't seen splitapply used before, so in all I learned quite a few tools to help me with tasks in the future
cheers!

Sign in to comment.

More Answers (1)

Stephane Dauvillier
Stephane Dauvillier on 15 Jul 2019
Try this:
in=["chb12_06.edf"
"1665 seconds"
"1726 seconds"
"3415 seconds"
"3447 seconds"
"chb12_08.edf"
"1426 seconds"
"1439 seconds"
"1591 seconds"
"1614 seconds"
"1957 seconds"
"1977 seconds"
"2798 seconds"
"2824 seconds"
"chb12_09.edf"
"3082 seconds"
"3114 seconds"
"3503 seconds"
"3535 seconds"
"chb12_10.edf"
"593 seconds"
"625 seconds"
"811 seconds"
"856 seconds"]
notQuiteThat = regexp(in,"(\d+) second","once","tokens") % extarct only digits befor second
notQuiteThat = [notQuiteThat{:}] % delete empty candidate
theSeconds = str2double(notQuiteThat) % transform it into actual double value
  3 Comments
Z Liang
Z Liang on 15 Jul 2019
Thank you, this works! When I tried I noticed a difference between
"(\d+) second"
versus
"\d+ second"
in that the first works and gave me numbers, but the second just gave me empty cells; what does the parenthesis do in this instance?
Also, can you show me how to assign the seconds values to different coloumns if they are after a different .edf file?
For example:
Column 1: chb12_06.edf 1665 1726 3415 3447
Column 2: chb12_08.edf 1426 1439 1591 ... ...

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!