optimal selection of rows and columns

2 views (last 30 days)
Benjamin Bechtel
Benjamin Bechtel on 11 Jul 2011
I have a optimatimisation problem which sounds easy but yet no trivial solution has come into my mind. I have a logical array (with data acquisition points) – lets call it X. I want to select rows and cols in such a way, that X(sel_rows, sel_cols) are all 1. Thereby I’d like to have a optimal choice in respect to a weighting function of the number of choosen rows and cols. In the easies case, this would just be the number of chosen elements: sum(sel_rows == 1) * sum(sel_cols == 1).
Do you now a MATLAB function to solve the problem or an easy algorithm that is converging to the optimum? Brute force methods take to long for the given Matrix-size.
Thanks in advance, Benjamin
  2 Comments
Walter Roberson
Walter Roberson on 15 Jul 2011
0110110000
0110010000
0010001111
If you select all rows, then the only column that is complete is #3, for a total of 3 selected elements
If you select row #1 by itself, you can select columns 2, 3, 5, 6, for a total of 4 elements.
If you select row #2 by itself, you get columns #2, 3, 6, for a total of 3 elements.
If you select row #3 by itself, you get columns #3, 6, 7, 8, and 9, for a total of 5 elements. This is the most complete row.
If you select rows #1 and 2, then columns #2, 3, and 6 can be selected, for a total of 6 elements (2 x 3)
If you select rows #1 and 3, then only column #3 would be complete, for a total of 2 elements (2 x 1)
If you select rows #2 and 3, then only column #3 would be complete, for a total of 2 elements (2 x 1)
If one were to try to proceed by selecting the most complete row, that would be #3, but in fact row #3 does not happen to get selected at all in the maximum solution for this configuration.
The solution here is X([1 2], [2 3 6]) . Second best is X([3], [3 6 7 8 9]).
See how it goes? Whatever list I and J you select, every member of X([I],[J]) must be a 1, and you want (in the simplest version of this problem) to maximize numel(X([I],[J])).
The question is whether there is an algorithm that is better than trying each member of the cross product of the powersets of 1:size(X,1) by 1:size(X,2)
Benjamin Bechtel
Benjamin Bechtel on 15 Jul 2011
sorry, i haven't seen this comment before posting below. yes, that is exactly the problem!

Sign in to comment.

Answers (6)

bym
bym on 12 Jul 2011
I am sure I don't understand your question, but how about:
[r,c] = find(X==1); %...?

Image Analyst
Image Analyst on 12 Jul 2011
I don't understand either. I guess you have somehow selected certain rows, which are specified in sel_rows, and certain columns, which are specified in sel_cols. Then you want to make sure all the elements in those rows or columns are equal to 1. So why don't you just do
X(sel_rows, :) = 1;
X(:, sel_cols) = 1;
I have no idea what the weighting thing is all about. And how huge is your matrix? Are we talking tens or hundreds of millions of elements? Don't say it's a few tens of thousand because that is in no way huge.

Walter Roberson
Walter Roberson on 12 Jul 2011
To cross-check: you would be looking for the maximum for the number of selected elements, right?
If so then I believe I understand what you are asking, but I will need to think more to see if I can come up with an algorithm.

Benjamin Bechtel
Benjamin Bechtel on 12 Jul 2011
Obviously my problem was ambiguous – sorry therefore. So first, the matrix X is given and may not be changed. It contains measurements (cols correspond to pixels/location, rows to times/scenes). I need a subset ob pixels and scenes that is complete – means it may not contain gaps (I use it as a input for PCA). The question ist, how to select the optimal subset. Optimal means the maximum of ones or as an advancement – a function of the number of pixels and the number of scenes (more scenes are better than more pixels). But lets just concentrate on the maximum.
@proecsm: Of course finding the one elements is simple, but I need a set of rows and a set of cols that only contains ones (still straightforward) and is the best possible subset (rather complicated).
@image analyst: The size is only about 100 * 200 elements, but this corresponds to 2^200 possible of colum subsets.
Thanks!
  2 Comments
bym
bym on 14 Jul 2011
maybe @Image Analyst can chime in but what about bwarea()?
Image Analyst
Image Analyst on 15 Jul 2011
Don't look at me - I'm still confused. I though he should just sum over columns a map of matrix elements identifying whether X is complete there (sumOfCols = sum(XisCompleteMap, 2)) and then find the row with the max number of sum to find the "most complete" row, but I'm not sure if that's right. I think a small example of the X matrix showing "more complete" rows and "less complete" rows would help illustrate this. Explain why the data shows that some rows are more complete than the other rows.

Sign in to comment.


Walter Roberson
Walter Roberson on 15 Jul 2011
At worst you could loop over K in the powerset of the row indices, at each point selecting the columns find(all(X(K,:))) .
This algorithm can potentially be made a bit more efficient in that if you have found a solution involving N selected elements, you can skip the test for any set of rows if the number of rows in the set times the total number of columns in X, is less than N: even if every column was selected in each of those rows, you would not be beating your score of N.

Benjamin Bechtel
Benjamin Bechtel on 15 Jul 2011
Hello Mister Robertson,
this is closer to the solution but still doesn’t cover the full problem yet. The best set will definitely not contain all columns! You can see, if you generate a small test set by:
rand(10) > 0.3
Even the maximum of 1 in one row
find(x(k,:) = max(sum(x(k,:),2))
is not necessarily the best solution, as you can see in this simple example:
i ii iii iv v
A 1 1 0 1 0
B 1 1 0 1 0
C 1 1 1 1 0
Your proposal would select nothing. The maximum would select x([C],[i:iv]) {= 4 elements}, but the optimum was x([A,B,C], [i, ii, iv]) {=9 elements};
Also this approach is very time consuming – 100 rows make 2^100 possible row selections ….
  1 Comment
Walter Roberson
Walter Roberson on 15 Jul 2011
I suspect you missed the fact that I said "powerset"; see http://en.wikipedia.org/wiki/Power_set . Looking over the powerset would indeed find the combination you indicated.
But yes, the powerset does involve 2^N possibilities where N is size(X,1) . That is still much cheaper than 2^(N+M) where N is the number of rows and M is the number of columns, the pure brute-force method.

Sign in to comment.

Categories

Find more on Programming in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!