Problem with dates and intersect

1 view (last 30 days)
MC3105
MC3105 on 3 Nov 2014
Edited: dpb on 4 Nov 2014
Hello everyone,
i came across the following problem this morning and I haven't been able to fix it for the last few hours.. so I am hoping, someone here might be able to help out.
I have two date vectors. One contains all dates for the year 2011:
dat_2011=(datenum(2011,01,01,0,0,0):1/24:datenum(2011,12,31,23,0,0))';
The other one contains only some of the dates of the year 2011: in total it contains 8461 dates of the year 2011. This vector ist called dat_xx
Now I want to use intersect to find out which indexes in vector dat_2011 correspond to the dates in vector dat_xx. So it is very important to me to find out ia and ib:
[dat_xx,ia,ib]=intersect(dat_2011,dat_xx,'rows');
When I run my code, matlab tells me that there are in total only 7722 dates that are the same in the two vectors. ia and ib both have 7722 entries. The problem is, that i know, that in total there should be 8461 dates that are part of both vectors.
Did I maybe do something wrong when I created the vector dat_2011?
I can use several matlab functions like hour, month, year, minute, second for all the elements in both of my date vectors... so I have now idea what the problem could be... matlab seems to recognize the elements in both vectors as dates...
Thanks alot!!

Accepted Answer

dpb
dpb on 3 Nov 2014
Edited: dpb on 4 Nov 2014
dat_2011=(datenum(2011,01,01,0,0,0):1/24:datenum(2011,12,31,23,0,0))';
... Did I maybe do something wrong when I created the vector dat_2011?
Ayup...you generated the date vector from the two end date values and the floating point delta using colon instead of using internally (to datenum) generated values. Use
dat_2011=(datenum(2011,1,1, [0:24*365-1].',0,0);
instead. Internally datenum will generate self-consistent values that will work for comparisons while the colon operation uses different algorithms to minimize error between initial and final values.
The general rule is to use integer-valued increments of the proper size to span the desired time and granularity desired rather than using the fractional days in floating point and introducing that external rounding error. The values will be very similar but when you use floating point comparisons later, even a single bit in the least significant position will cause a failure.
Try looking at the difference between the two series as generated above; I would expect you'll find that the difference is otoh E-15 and will be symmetric around the midpoint of the series owing to how : works internally.
ADDENDUM
I did the comparison..the actual difference in the two is
>> dat_2011=(datenum(2011,01,01,0,0,0):1/24:datenum(2011,12,31,23,0,0))';
>> dn=datenum(2011,1,1,[0:24*365-1].',0,0);
>> max(dat_2011-dn)
ans =
1.1642e-10
>>
In the previous error estimate I was forgetting to factor in the magnitude of datenums being on order of 10E5.
>> eps(dn(1))
ans =
1.1642e-10
>> eps(1)
ans =
2.2204e-16
>>
ADDENDUM 2
...The general rule is to use integer-valued increments of the proper size to span the desired time and granularity desired rather than using the fractional days in floating point and introducing that external rounding error. The values will be very similar but when you use floating point comparisons later, even a single bit in the least significant position will cause a failure.
ATTN: TMW The above caution or similar should be in the documentation on date numbers. This issue arises repeatedly owing to what seems to be a reasonable way to generate the date vector is, in fact, guaranteed to fail as demonstrated above. This question comes up over and over and is, afaik, never mentioned in the doc's. Although one with some experience can infer it from floating point behavior, it'll catch virtually everybody at some time or the other until they've seen/experienced it.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!