multiple linear regression warning: x is rank deficient to within machine precision.

15 views (last 30 days)
I have a multiple linear regression problem
A =
1 0 0 0 0 1 0 0
0 1 0 0 0 0 1 0
0 0 1 0 0 0 0 1
0 0 0 1 1 0 0 0
1 0 0 0 0 0 0 1
0 1 0 0 1 0 0 0
0 0 1 0 0 1 0 0
0 0 0 1 0 0 1 0
y =
0.7062
0.7419
-0.5401
1.7201
1.4434
2.2419
-2.1379
0.8751
I got a warning when running
>> b=regress(y,A)
Warning: X is rank deficient to within machine precision.
> In regress (line 84)
b =
1.6585
0.9056
-0.7552
0.7113
1.1725
-1.1675
0
0
b is the same answer when using Least square method (A'*A)\(A'*y).
As A is badiy conditioned, I don't know if I can trust the result. Is there another solution? Please advise!

Accepted Answer

John D'Errico
John D'Errico on 31 Oct 2021
Edited: John D'Errico on 31 Oct 2021
As I pointed out in my comment to Star Strider, your problem is rank deficient.
A = [1 0 0 0 0 1 0 0
0 1 0 0 0 0 1 0
0 0 1 0 0 0 0 1
0 0 0 1 1 0 0 0
1 0 0 0 0 0 0 1
0 1 0 0 1 0 0 0
0 0 1 0 0 1 0 0
0 0 0 1 0 0 1 0];
y = [0.7062
0.7419
-0.5401
1.7201
1.4434
2.2419
-2.1379
0.8751];
No exact solution is possible, but there are infinitely many possible solutions, none of them any better than the rest. Can you "trust" the result from regress? Yes, to the extent that since any solution from infinitely many solutions is possible and equally good, how would you choose to define "trust"? The solution is non-unique.
xreg = regress(y,A)
Warning: X is rank deficient to within machine precision.
xreg = 8×1
1.6585 0.9056 -0.7552 0.7113 1.1725 -1.1675 0 0
Regress uses a method that will set two of the unknowns arbitrarily to zero. Is that a better solution that that which pinv will offer?
xpinv = pinv(A)*y
xpinv = 8×1
1.1408 0.7945 -1.2729 0.6002 1.2836 -0.6498 0.1111 0.5177
Each of these solutions are equally good or bad, in terms of fitting the vector y.
norm(A*xreg - y)
ans = 0.5408
norm(A*xpinv - y)
ans = 0.5408
As you can see, each solution is equally correct, since as I said, no solution exists that is exact. However the pinv solution is one that has minimum norm. Thus...
norm(xreg)
ans = 2.7176
norm(xpinv)
ans = 2.5028
the pinv solution typically will have no zero elements in it, but it has a smaller euclidean norm. It is the solution that has the smallest possible norm, among the infinitely many solutions. Note that lsqr does have the proerty that it will yield the same solution that does pinv.
So when you ask if you can "trust" the solution from regress, well yes, you can trust it. At least, you can do so, if you understand what it means and what are the limits to that solution. Finally, you did mention use of the normal equations solution:
(A'*A)\(A'*y)
The funny thing is, that is the one solution you should NOT trust, even though people seem to want to teach others to use it. They are wrong. Oh well, this could become a completely different lesson in linear algebra, and a lengthy one.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!