L2 norm minimization

Hello, I'd like to find a transformation of a set of vectors X that maps to known vectors Y. This can be expressed as a standard least square optimization problem, i.e min |WX - Y|^2 where W is the transformation matrix I'm looking for.
Additionally, I would like to minimize the L2 norm of the matrix W so the full minimization problem becomes : min |W|^2 + |WX - Y|^2
This problem can be solved with gradient descent and I was just wondering which function from the Matlab optimization tool I should use ?
Thanks!

4 Comments

I have a similar question: I want argmin_X { |WX-Y|^2 + |X|_1 }. In case, my notation is confusing, I want to find the X that minimizes the square of the 2-norm of (WX-Y) with a regularization of the 1-norm of X. W is m x n, Y is m x 1 is How to do this in Matlab? Thank you!
John D'Errico
John D'Errico on 26 Feb 2018
Edited: John D'Errico on 26 Feb 2018
I think your notation seems spot on. But your goal? Sorry, but this seems to be a bit silly as a goal. I do accept that you can choose any goal that you want, but why this?
By having a mixed problem, thus minimizing the 2 norm of (W*X-Y), combined with minimizing the 1-norm of X, you want to live in a world where nothing will be simple to write and solve, when solving the very similar problem wherein the regularization is on the 2-norm of X is trivial to solve.
You would need to formulate this as a general nonlinear optimization, with the caveat that due to the 1-norm, you will have a problem that is non-differentiable in the parameters. So given use of optimization tools like fminunc (or whatever), the optimization may run poorly, and may fail to converge. It will probably be dependent on starting values. While admittedly you can use the 2-norm solution to provide what may be decent starting values, if they are that good, then why bother with this at all?
I don't see much benefit in such a problem formulation. Seemingly little to gain, for increased complexity, convergence issues, starting values, etc. And when you can write the fully L2 norm problem in essentially one line of code that has no issues, why?
You can view a regularized linear least squares as a biased solution, where you have an a-priori belief that the solution has a small norm. (I am taking a somewhat reluctant Bayesian stance here.) Typically this is only important for problems with rank deficient or a somewhat ill-posed system, as then the ill-posed nature will tend to inflate any noise in the data, giving you garbage for results. The regularizer will tend to bias the results towards what you want to see. Such regularization is hugely valuable, in fact I use it to great benefit on many problems. But that regularization is a very ad hoc choice, as well as exactly how much bias to use. So in this context, insisting that the regularizer would be based on the 1-norm just seems strange.
I appreciate your comments and suggestions. Unfortunately I did not choose this minimization problem. The one norm is intended to force sparsity on the vector components. I am attempting to replicate what the authors of a certain article did and then make a comparison to my novel method.
Thanks for your input. I will explore your ideas.
You can obviously do as you wish, even if it seems absurd to me.
You will have little choice but to use an optimization, because of the mixed norms. Thus, were it entirely an L1 norm, you can solve the problem using linprog. Were it entirely an L2 norm, you can just use backslash. The hybrid? Just a brute force optimization, and as I said, it may exhibit some issues at that.

Sign in to comment.

 Accepted Answer

John D'Errico
John D'Errico on 2 Jun 2011
Edited: John D'Errico on 10 Mar 2017
Using an nonlinear optimizer to solve this is overkill and completely silly, when a linear solver does it for you. Besides, with an optimizer you need to worry about convergence issues, starting values, etc.
Append an identity matrix to X, in this case as extra columns of X. Also append zeros to y. Then use slash (i.e /) to solve the problem.
W = [Y,zeros(1,N)]/[X,eye(N)];
This solves the regularized problem, where N is the number of rows in X. This solves the problem you have, with no optimizer needed at all. See that when [X,ones(N)] has more columns than rows, it solves the least squares problem you are asking to solve.
doc slash
help slash
As an example,
For example, make up some data:
X = rand(10,20);
Y = rand(1,20);
We wish to solve the problem
W*X = Y
This is easily accomplished using slash, if we did no regularization at all.
u = Y/X
u =
-0.10061 -0.039857 -0.087301 0.14731 0.40622 0.36113 -0.31992 0.32343 0.071708 0.16989
With regularization, thus solving the penalized sum of squares, we do it as:
u = [Y,zeros(1,10)]/[X,eye(10)]
u =
-0.013147 0.042605 0.0085151 0.089504 0.24349 0.1602 -0.11616 0.20876 0.14126 0.13649
The solution has been biased towards zero, as we should expect.

2 Comments

This gives a W with dimension 1*(M+N). But we should only get 1*M, right?
John D'Errico
John D'Errico on 10 Mar 2017
Edited: John D'Errico on 10 Mar 2017
It was a long time ago that I wrote that. :)
As it turns out, I was sloppy, and wrote the wrong expression, I forgot to append zeros to Y. I've fixed that, as well as adding an example. As you can see, the solution has the correct number of values.
You are mistaking the solution for one where you solve the problem
X*W = Y
Here, you would use BACKSLASH. Thus
W = X\Y
or with the simple ridge regression regularization employed here, that becomes
W = [X;eye(N)]\[Y;zeros(N,1)]
Note the differences. In this case, I append X as rows to X, also zero rows to Y.
It is important whether you are using / or \ to solve the problem. In turn, that depends on which form your original problem was in.

Sign in to comment.

More Answers (1)

The slash operator is exactly what you need to solve your system of equations.
For example, you have indicated that you have the equation WX = Y where X and Y are given. To arrive at the least-squares fit for an overdetermined system, MATLAB has the algorithm built-in to the forward slash operator, so that you simply need to type in the following line:
W = Y/X
Often the system of equations is written as Ax = b where one must solve for x --- in this case use the backslash operator:
x = A\b
If you are interested in the optimization functions available, check out this page which shows all of the functions available in base MATLAB:
Optimization Toolbox and Global Optimization Toolbox also have many other functions available.

1 Comment

Note that slash by itself fails to solve the problem given by the OP. You must augment the system with an identity to introduce a term consistent with the square of the norm of X.

Sign in to comment.

Asked:

al
on 2 Jun 2011

Commented:

on 27 Feb 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!