Processing array where the elements are sometimes min/sec and sometimes hour/min/sec

1 view (last 30 days)
I have some race data of the form
raceTime = {'28:44','54:08','1:02:34','1:58:33'};
Because some times are less than an hour, and some are more, the inputs are in both hh:mm:ss and mm:ss format. It will never be the case that ##:## represents hh:mm.
I'm trying to get the duration (in, say, minutes) of these race times. Thoughts on the most elegant way to process this? (I can think of a few inelegant ways.)

Answers (6)

dpb
dpb on 16 Oct 2018
Edited: dpb on 16 Oct 2018
I won't claim it's elegant and certainly not "most" so, but...
function et=raceDuration(tstring)
% return durations for [hh:]mm:ss input cell string array
hms=cellfun(@(s) split(s,":"),tstring,'uni',0); % get pieces (not all complete)
for i=1:length(hms)
try
et(i)=duration(str2double(hms{i}).');
catch
et(i)=duration([0 str2double(hms{i}).']);
end
end
works for your example data.
Any other way to fixup the missing hours that came to me at least so far seemed more painful than the loop; unfortunately no way to put a try...catch...end construct in a cellfun anonymous function to deal with the missing field.
ALTERNATE
(W/ attribution to Stephen for (again) reminding me sscanf will handle unusual cases more gracefully than I always think will...)
hms=cellfun(@(s) sscanf(s,"%d:"),raceTime,'uni',0)';
te=duration(cell2mat(cellfun(@(x) [zeros(1,3-length(x)) x.'],hms,'uni',0).'));
>> te
te =
4×1 duration array
00:28:44
00:54:08
01:02:34
01:58:33
>>
And, the above "trick" cleans up the original function quite a bit, too...
function et=raceDuration(tstring)
% return durations for [hh:]mm:ss input cell string array
hms=cellfun(@(s) sscanf(s,"%d:"),raceTime,'uni',0)'; % get pieces (not all complete)
N=numel(hms);
te(N,1)=duration(); % preallocate
for i=1:N
try
et(i)=duration(hms{i}.');
catch
et(i)=duration([0 hms{i}.']);
end
end
ADDENDUM
And, of course, you can change the Format property...
>> te.Format='m'
te =
4×1 duration array
28.733 min
54.133 min
62.567 min
118.55 min
>> te.Format='s'
te =
4×1 duration array
1724 sec
3248 sec
3754 sec
7113 sec
>>
depending on how want the result to look...

Stephen23
Stephen23 on 16 Oct 2018
Edited: Stephen23 on 16 Oct 2018
As long as the last unit is always the same then you could use this:
>> C = {'28:44','54:08','1:02:34','1:58:33'};
>> V = [60,1,1/60]; % [H,M,S]
>> F = @(s)V(end-nnz(s==':'):end)*sscanf(s,'%d:');
>> M = cellfun(F,C)
M =
28.733 54.133 62.567 118.550
It will be reasonably efficient as it does not change/duplicate the input data, and uses efficient sscanf and matrix multiplication. For maximum speed replace cellfun with a preallocated loop.

the cyclist
the cyclist on 16 Oct 2018
Here is the idea I had, after posting this. The core idea is to create a "template" of zeros for the largest format needed (e.g. 00:00:00), and then superimpose the available digits on the end.
% Original data
raceTime = {'28:44','58:39','1:02:34','1:58:33'};
% Create a template of all zeros, for which the times will be superimposed
templateFormat = '00:00:00';
template = cell(size(raceTime));
template(:) = {templateFormat};
% Need to know the number of digits to superimpose
digitsInRaceTime = cellfun(@(x) size(x,2),raceTime,'UniformOutput',false);
% For each element, superimpose the right digits
durationCell = cellfun(@(x,y,z)([x(1:(numel(templateFormat)-z)) y]),template,raceTime,digitsInRaceTime,'UniformOutput',false);
% Get duration
durationInMinutes = minutes(duration(durationCell))
  4 Comments
dpb
dpb on 17 Oct 2018
I hadn't seen this until after I added the alternate solution triggered by Peter's, but I commented identically the same idea that it seems as though that would be a relatively easy option to have included in the function design and seems to me a reasonable if not highly important enhancement.
Stephen23
Stephen23 on 17 Oct 2018
Edited: Stephen23 on 17 Oct 2018
I also tried this kind of thing (I used regexprep), but the fact that it requires making copies of the data is not particularly "elegant" in my view: why make extra variables when it can be done efficiently using the existing data?

Sign in to comment.


Peter Perkins
Peter Perkins on 17 Oct 2018
cyclist, if you know that the text is a mixture of those two formats, can't you convert using one format, and then go back and convert the things that failed, using the second format? Maybe someone else already suggested that.
>> raceTime = {'28:44','54:08','1:02:34','1:58:33'};
>> t = duration(raceTime,'InputFormat','mm:ss')
t =
1×4 duration array
00:28:44 00:54:08 NaN NaN
>> i = isnan(t)
i =
1×4 logical array
0 0 1 1
>> t(i) = duration(raceTime(i),'Format','hh:mm:ss')
t =
1×4 duration array
00:28:44 00:54:08 01:02:34 01:58:33
Or just tack on a leading hours field where needed?
>> raceTime(i) = strcat('0:',raceTime(i))
raceTime =
1×4 cell array
{'0:28:44'} {'0:54:08'} {'1:02:34'} {'1:58:33'}
>> t = duration(raceTime,'Format','hh:mm:ss')
t =
1×4 duration array
00:28:44 00:54:08 01:02:34 01:58:33
  1 Comment
dpb
dpb on 17 Oct 2018
" can't you convert using one format, and then go back and convert the things that failed,"
That was my first approach altho I put in try...catch block. The logical addressing is good...if could fold into an anonymous function somehow--have to mull that over.

Sign in to comment.


dpb
dpb on 17 Oct 2018
Edited: dpb on 17 Oct 2018
OK, thanks to Peter for triggering the idea on how to add the missing hour substring dynamically! :)
>> et=cellfun(@(s) duration(sscanf([repmat('00:',sum(s==':')==1) s],'%d:').','Format','hh:mm:ss'),raceTime)
et =
1×4 duration array
00:28:44 00:54:08 01:02:34 01:58:33
>>
I am using R2017b so duration is still limited to the three numeric inputs; it doesn't accept the time string form. Not sure which release had the enhancement; but if one has that then can remove the call to sscanf and parse the augmented string directly--
cellfun(@(s) duration([repmat('00:',sum(s==':')==1) s]),'Format','hh:mm:ss'),raceTime,'uni',0)
Seems like it wouldn't be too much of a stretch to let leading missing field(s) be implied zeros automagically...
  5 Comments
the cyclist
the cyclist on 17 Oct 2018
Edited: the cyclist on 17 Oct 2018
I'm not going to do exhaustive time-testing. My real data is only about 1,000 elements, so time is not that big a deal. But I ran
  • Peter's two-step duration code
  • my own "template" code
  • this one-liner
  • Stephen's sscanf code
one thousand times each. The results were roughly
  • 1 second
  • 6 seconds
  • 4 seconds
  • 7 seconds
dpb
dpb on 17 Oct 2018
That's a good catch to pull duration out of the cellfun, cyclist.
Interesting the relative poor showing of sscanf; as so often the case, sometimes what we think is a bottleneck may turn out not to be or vice versa...by the rank of Peter's, I'd guess probably the try...catch loop would fare pretty well as well altho I hadn't tried any timings was mostly just playing "golf" to see if could get it down to the one-liner as entertainment! :)

Sign in to comment.


the cyclist
the cyclist on 17 Oct 2018
Edited: the cyclist on 17 Oct 2018
Oh, dear. I just found a terrible, wonderful obfuscated solution:
digitArray = char(raceTime)-'0';
idx = digitArray(:,end)==-16;
digitArray(idx,:) = [zeros(sum(idx),2), digitArray(idx,1:end-2)];
durationInMinutes = digitArray(:,1)*60 + digitArray(:,3)*10 + digitArray(:,4) + digitArray(:,6)/6 + digitArray(:,7)/60;
The same thousand iterations of this takes only 0.07 seconds.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!