# Random sampling of elements from an array based on a target condition in MATLAB

13 views (last 30 days)
Edited: Richard Brown on 7 Nov 2019
I have an array (let's call it ElmInfo) of size Nx2 representing a geometry. In that array the element number and element volume are on the column 1 and column 2 respectively. The volume of elements largely vary. The sum of the volume of all elements leads to a value V which can be obtained in MATLAB as:
V=sum(ElmInfo(:,2));
I want to randomly sample elements from the array ElmInfo in such a way that the volume of sampled elements (with no repetition) will lead to a target volume V1. Note: V1 is less than V. So I don't know the number of elements to be sampled. I am giving an example. For a sampling case number of sampled element can be '10' whereas in other sampling number of sampled element can be '15'.
There is no straightforward MATLAB in-built function to meet the target condition. How can I implement the code in MATLAB?

Richard Brown on 5 Nov 2019
If your process is just to sample until you are within a tolerance of V_1 and you want things to be truly random, how about the following algorithm:
• Randomly shuffle the entries of V
• Take a cumulative sum
• If there is an entry within a certain tolerance of V_1, keep the corresponding entries and exit
• Otherwise, repeat
This is a completely naive brute-force approach, which won't work well at all if your tolerance is small. But if your tolerance is generous, the following might do the job well enough, and will actually sample things at random.
n = 300;
v = rand(n, 1);
V1 = 17;
tol = 1e-2;
sample = [];
maxits = 10000;
for count = 1:maxits
p = randperm(n);
s = cumsum(v(p));
k = find(abs(s - V1) < tol);
if ~isempty(k)
sample_indices = p(1:k(1));
sample = v(sample_indices);
fprintf('sample found after %d iterations\n', count);
break
end
end
@Richard, Thanks a lot.
This works for me. I have tried with a tolerance of 1e-5.

John D'Errico on 4 Nov 2019
Edited: John D'Errico on 4 Nov 2019
Are you looking for an exact match to V1? An approximate sum equal to V1? If so, then how close of a tolerance do you need?
Is there anything already in MATLAB to do this exactly? Absolutely not that I know of. Certainly not in MATLAB proper. You might find something on the file exchange, depending on whether you need an exact match to V1 or one with a tolerance.
Essentially, I would call this a random boolean (or 0-1) knapsack problem. An element from your set either appears in the knapsack, or it does not, but not more than one copy of any element. The problem is there may well be many possible subsets that fit your goal,
I could offer a solution that uses intlinprog, since that is a classical way to solve such a problem, although much depends on whether you need an exact solution or not.
John D'Errico on 6 Nov 2019
Edited: John D'Errico on 6 Nov 2019
Yes, but that is a really bad example in my opinion. No solution method can distinguish between the four ones in that vector. If the goal is a binary solve, so any element can appear no more than once, then if your vector has replicates in it, you confuse the issue because you are not properly treating the goal that any element cannot appear more than once. This confuses the question of what uniform sampling means. I'm not even sure what one might mean about uniform sampling in the case of replicate elements. There are certainly multiple ways you could argue it.
Your example (with no tolerance) is a variant of an integer partitions problem, solved by my partitions code (found on the FEX.)
For example, given the set 1:8, and a target sum of 20.
sols = partitions(20,1:8,1)*diag(1:8)
sols =
0 2 3 4 5 6 0 0
1 0 3 4 5 0 7 0
1 2 0 4 0 6 7 0
0 0 3 4 0 6 7 0
0 2 0 0 5 6 7 0
1 2 0 4 5 0 0 8
0 0 3 4 5 0 0 8
1 2 3 0 0 6 0 8
0 2 0 4 0 6 0 8
1 0 0 0 5 6 0 8
0 2 3 0 0 0 7 8
1 0 0 4 0 0 7 8
0 0 0 0 5 0 7 8
sum(sols,2)'
ans =
20 20 20 20 20 20 20 20 20 20 20 20 20
size(sols)
ans =
13 8
There are 13 distinct ways to generate that target sum. The rows of that matrix represent the entire set of possible solutions.
Are each of them equally likely, if I use the random projection scheme I described? Your scheme should work as a Monte Carlo simulation to see how well it does, but you NEED to be more careful with hist!!!!!!!!!!!!!!!
[~,ind] = min(sols*randn(8,10000000),[],1);
hist(ind,100)
It does not look that bad, but it is admittedly non-uniform, in the sense that each possible event was not equally likely. Not too far off though. I'd probably accept it as a sampling scheme, given the alternatives. For example, the random scheme you proposed has no reason to be any better in terms of "uniformity".
However, if I use hist like this:
hist(ind)
See that hist has edge problems. So your histogram is likely invalid, as you generated it. Never use hist as you did, allowing it to choose the bins. Note that you got the same kind of behavior when you used hist, thinking it showed a highly non-uniform sampling. Hist is confusing the issue when you use it like that.
Richard Brown on 7 Nov 2019
Edited: Richard Brown on 7 Nov 2019
My histogram is fine :)
If you look a little closer, you'll see I specified the bin centres (1:nsols) -- you can see that the 11 bars correspond to the 11 solutions. And the bars are exactly as expected - all solutions of the same class (e.g. a two and two ones) have around the same height, which is why there are three different heights evident, corresponding to the different ways of choosing (1,1,1,1), (2, 1, 1), and (3, 1)
While my toy example involving integers is admittedly contrived, it makes the point I was trying to make that each possible solution (where replicates are considered distinct) is not equally likely to be chosen! No more, no less!
And yes, the random scheme I proposed (before you suggested your elegant random objective orientation) is no better ... no arguments there

### Categories

Find more on Direct Search in Help Center and File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!