Matrix multiplication not returning on certain matrix content (Intel)

Hi,
I have a certain matrix multiplication which is not returning on my Intel machine (Win11 Pro, Intel i7-8700). When I exchange the content of the matrices with random numbers I can obtain a result within seconds. I could reproduce this problem on several Intel machines (and different Matlab versions). I relate this problem to Intel MKL. One matrix contains a lot of entries close to zero (10^-322) which probably causes the problem.
To reproduce it, download the two matrices:
and run the command:
scc=At*scc;
Is this operation running through on your machine?

2 Comments

It is generally a bad idea to force someone to download a file from off-site. We don't know if you are planting a trojan on our system. Honestly, I won't touch it to test your question as a download. Maybe others will be more willing to take a chance on an unknown person, posting a link to an unkown file. I hope they don't take the risk since you could have made this much simpler.
Just attach a .mat file that contains the matrices. Click on the paper clip icon on a comment.
It's about 4 GB and I don't know if I can reproduce it on any other size and I have no intent to invest any more time into this. The provided link directs to a German university (over https) with a download service for larger files. The university scans the files internally for viruses. It's the best I can do.

Sign in to comment.

Answers (1)

The matrix contains many non normalized floating point numbers and operations on these numbers are very slow;
>> load('C:\Users\bruno\Downloads\test.mat')
>> size(At)
ans =
17534 17532
>> size(scc)
ans =
17532 8281
>> tic; B=At*scc(:,1); toc
Elapsed time is 1.105924 seconds.
>> tic; B=At*scc(:,1:10); toc
Elapsed time is 10.237435 seconds.
>> tic; B=At*scc(:,1:100); toc
Elapsed time is 108.484422 seconds.
On my machine it takes 1 sec by column of scc. So eventually I guess it will finish in less than 3h.
Note that for normalized numbers the operation is roughly 200 time faster.
So for now I think there is no bug. Just let the thing run over night (Sorry I won't do that).

13 Comments

Thanks for your help Bruno, I also found that out now. When replacing the denormalized values by zero through:
At(abs(At)<realmin())=0;
I get the usual performance. The question is if MATLAB is supposed to produce denormalized values in the first place. 'At' is created through the inversion of sparse matrices:
At=Ac\full(Ac'\A');
where 'A' is a banded matrix (small width) and 'Ac' is a triangular banded matrix (also small width). 'A' and 'Ac' do not contain denormalized numbers.
AFAIK TMW doesn't state anything about denormalization numbers so I suppose it supports and let the processor do whatever ot can without extra intervention.
But in this twighlight zone I wouldn't trust any meaninful outcome. Better consider those numbers as 0 and shalow the subsequent implication.
Thanks for this advice. In view of the just revealed performance problem and possibly other related problems not yet discovered, I would strongly suggest that MATLAB checks their built-in routines more rigorously for the creation of denormal numbers. I use MATLAB for fifteen years now and never stumbled over such problems (i.e., denormal numbers) before (because I never used sparse matrices intensively).
What would you recommend we do when you pass a denormal/subnormal number to our functions?
  1. Check for their presence and issue a warning? That means that everyone who calls those functions, both those whose data includes denormal numbers and those whose data does not, will pay the cost of checking for denormal numbers every time those functions are called. That's not going to happen unless it were free and it wouldn't be free.
  2. Check for their presence and throw an error? Again, you're forcing everyone to pay the cost of checking.
  3. Check for their presence and replace them with a normal number (0 or realmin, perhaps?) In addition to the cost of checking, this would silently (or perhaps not silently, if we were to warn we were making that replacement) modify the data you pased into the function.
  4. Encourage hardware manufacturers to improve the performance of their instructions on denormal numbers? This could be a long term solution, but it could be a tough sell due to bang-for-buck considerations. And it's not one that would be limited to MATLAB and not one that we could implement ourselves.
An approach you could use, if you wanted to guard yourself against denormal numbers, would be to implement item 3 yourself at the time the data is loaded into MATLAB, when it is first created, and/or when you perform an operation that you believe will introduce denormal numbers.
format longg
A = [1; 0; realmin; eps(0)]
A = 4x1
1 0 2.2250738585072e-308 4.94065645841247e-324
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
A(abs(A)<realmin) = 0
A = 4x1
1 0 2.2250738585072e-308 0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
One more point: I haven't looked at your matrices to determine how many denormals they contain. But I have to wonder if you're misinterpreting the data, or imported the data into MATLAB incorrectly. Often I see numbers with varying orders of magnitude crop up when interpreting the hex pattern of double or single precision data as single or double precision respectively.
A = pi
A =
3.14159265358979
B = typecast(A, 'single')
B = 1x2
1.0e+00 * 3.370281e+12 2.142699
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
The second element of B is on the same order of magnitude as pi, but the first is much, much larger. But if we look at their hex patterns:
format hex
A
A =
400921fb54442d18
B
B = 1x2
1.0e+00 * 54442d18 400921fb
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Same bits, different interpretation.
@Steven Lord "What would you recommend we do when you pass a denormal/subnormal number to our functions?"
Personally my prefeirence is doing nothing like the situation right now for all the reason you mentionned. But if I have to, I would like
  • an (HW) exception is caugh and either an error or warning is issue if any intermediate/final result of an arthmetics operation falls into denormalization numbers. Not sure if that HW exception is possible with the current Intel chips.
  • option in loading matfile to be able to detect the presence of denormalize numbers, This option if off by default.
  • function to check if denomalizing numbers like isnormalized(x) (would it be a more effeicient version of abs(x) < realmin(class)), and probably a recursive call of such function on structure, cell, table, ... This could be a tool used by the previous bullet item.
Personally I can't remember encountered denormalized numbers in my works. IMO one would fall into it when doing some sort of cumulative product with number barely smaller than 1. cumulative product like determinant is a bad thing to manipulate numerically, rather work with sum of log.
So actually I don't care about how MATLAB would handle it. All I want is it doesn't waste time to handle it.
But I guess in some certification situation, one must prove that the program is robust and correctly handle aithmetics with such numbers.
I'm a scientist. I expect that the HW/SW can handle the results it self produces. When an FP ALU can create denormal numbers why does it then struggle to further process them? The HW/SW can obviously also hande nan and infs correctly, why does it then struggle with denormals? Maybe the call here is on Intel but maybe it is also just a flag to set in MKL to treat denormals as regular zeros (which is maybe already the default behavior of the ALU).
Denormals are handled by a completely different algorithm than normalized numbers. At least 3 different algorithms are needed:
  • when both numbers are normalized
  • when both numbers are denormalized
  • when one is normalized and the other is denormalized. It is possible that different algorithms might be needed depending on which of the operands is denormalized (for example for division)
It happens that on existing Intel chips, these algorithms are implemented in microcode rather than being fully hardware assisted.
Could Intel have hardwired those variant algorithms? Yes -- but doing so would require dedicating a fair surface area to perform rare actions. Costs would increase noticably.
It would be interesting if this code is performed on non Intel rocessor, eg AMD and report the results
load('test.mat')
tic; B=At*scc(:,1); toc
tic; B=At*scc(:,1:10); toc
tic; B=At*scc(:,1:100); toc
For the record herer is the results on my laptop Intel(R) Core(TM) i9-12900H
  1. Elapsed time is 0.433215 seconds.
  2. Elapsed time is 3.571832 seconds.
  3. Elapsed time is 37.346198 seconds.
"when one is normalized and the other is denormalized."
That is the case here. There are 9.4% of denormalized numbers in the left matrix At and 0 in the second.
The intermediate partial sum can have both though.
Hmmm... I wonder if the hardware could convert the numbers to 80 bit floats, do the operation, and convert back?
Certainly a very nice idea. But one of the reasons: As the exponet has only 4 more bits, its is still not enough to convert all denormalized 64 bits numbers to normalized 80 bits. It needs +6 bits exponent since 2^6 >= 52 > 2^5 > 2^4

Sign in to comment.

Categories

Products

Release

R2024a

Asked:

on 10 Apr 2024

Edited:

on 11 Apr 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!