Generate random integers that sums to a specific number within a specific range
85 views (last 30 days)
Show older comments
Dear members
I am attempting to generate random integers that sums to a specific number within a specific range. Infact, Roger Stafford provided a similar function randfixedsumint, but lacking the ability for specifying the min and max value that each integers could go.
In other words, I am looking for a function that could randomly generate 3 integers that sums to 1000 within a gange of 100 <= x <= 700.
function R = randfixedsumint(m,n,S);
% This generates an m by n array R. Each row will sum to S, and
% all elements are all non-negative integers. The probabilities
% of each possible set of row elements are all equal.
% RAS - Mar. 4, 2017
if ceil(m)~=m|ceil(n)~=n|ceil(S)~=S|m<1|n<1|S<0
error('Improper arguments')
else
P = ones(S+1,n);
for in = n-1:-1:1
P(:,in) = cumsum(P(:,in+1));
end
R = zeros(m,n);
for im = 1:m
s = S;
for in = 1:n
R(im,in) = sum(P(s+1,in)*rand<=P(1:s,in));
s = s-R(im,in);
end
end
end
return
Ref_1
Quite similar to what randfixsum(Random Vectors with Fixed Sum) could achieve, but I am having trouble limiting the generated value as just integers
Ref_2
Conditional Random number generation
2 Comments
KALYAN ACHARJYA
on 10 Feb 2021
Edited: KALYAN ACHARJYA
on 10 Feb 2021
randi function is not allow?
Is this way out?
data=0;
while sum(data)~=1000
data=randi([100,700],[1,3]);
end
Accepted Answer
Matt J
on 10 Feb 2021
Edited: Matt J
on 11 Feb 2021
Quite similar to what randfixsum(Random Vectors with Fixed Sum) could achieve, but I am having trouble limiting the generated value as just integers
I think a good approximate solution would be to use randfixedsum to get an initial continuous-space randomization x0. Then, you can use minL1intlin (Download) to project x0 to the nearest (in L1-norm) integer solution x. I don't know if this will be perfectly uniformly distributed, but I think it will be close. Example:
n=7; %number of variables
s=3000; %constrained sum
lb=100; %lower bound
ub=700; %upper bound
x0=randfixedsum(n,1,s,lb,ub);
e=ones(1,n);
x=round( minL1intlin( speye(n), x0, 1:n, [],[],e,s,lb*e,ub*e) );
>> [x0,x]
ans =
337.2273 337.0000
238.0432 238.0000
623.6036 624.0000
416.6231 417.0000
581.9920 582.0000
649.9868 650.0000
152.5241 152.0000
>> sum(x)
ans =
3000
0 Comments
More Answers (6)
Matt J
on 10 Feb 2021
Edited: Matt J
on 10 Feb 2021
For such a small problem size, you can pre-generate a table of allowable combinations:
[x,y]=ndgrid(uint16(100:700));
z=1000-x(:)-y(:);
idx=100<=z & z<=700;
comboTable = [x(idx),y(idx),z(idx)];
N=size(comboTable,1);
Then, you can just select rows randomly from the table:
xyz=comboTable(randi(N,1),:)
DGM
on 17 Jun 2021
Edited: DGM
on 17 Jun 2021
I've been wasting time on trying to do something similar lately. I can't believe I missed this question. For my own purposes, I couldn't figure out how to get randfixedsum() to do the various things I wanted (within memory constraints), so I did it my own naive way. If you only need a handful of values, I doubt that my answer is of any use to you, but maybe someone else who finds this post will find it helpful.
Attached is the current version of said file. It will output an array of integers with specified size and sum, with lower/upper boundaries depending on mode. Values can be generated to fit uniform, skew, exponential, or gaussian distributions. For example, all of these arrays have the same size and sum (i.e. they have the same mean of 200)
% generate uniform random integers with a sum of 2000 and maximum of 50
a = randisum([NaN 50],2000,[5 10],'uniform')
[sum(a(:)) min(a(:)) max(a(:)) round(mean(a(:)))]
Note that this file is not thoroughly tested yet. Don't be surprised if there are bugs. The lazy sum adjustment method potentially imparts some distortion to the distributions, so don't use it if you can't bear that, or come up with something better. I make no claims that these are particularly efficient, robust, or statistically meaningful ways to solve the problem. My own tolerance for these flaws is potentially higher than yours.
I may update the file as I work on it. EDIT: updated to improve sum adjustment and 'exponential' mode speed.
3 Comments
DGM
on 11 Aug 2023
When I wrote that, my needs were very undemanding, and I was unsatisfied with examples from other threads that were either slower or were limited by memory use. I took the opportunity to play a little outside my comfort zone.
John D'Errico
on 11 Aug 2023
I think this is a difficult problem to find a truly uniform integer solution. (Even the continuous case is quite a test of code.) Clearly the very best solution is to generate the set of all possible solutions, and then sample from them uniformly. Anything less than that needs to trade off aherance to a uniform sampling with speed. Your solution seems a reasonable tradeoff.
Bruno Luong
on 11 Aug 2023
Edited: Bruno Luong
on 11 Aug 2023
Revisit this old thread here is the randfixedsum version but for integers.
It's a quick implementation and it is slow, but mathematically it should generate conditional uniform distribution.
lo = 100;
hi = 700;
starget = 1000;
k = 3;
m = 20;
R = ricb(m, k, starget, lo, hi)
function R = ricb(m, k, s, lo, hi)
% R = ricb(m, k, s, lo, hi)
% pseudo random integer integer array with sum and bound constraints.
% INPUTS:
% m: scalar number of solutions
% k: scalar length of the solutions
% s: scalar target sum
% lo: scalar kower bound
% hi: scalar upper bound
% OUTPUT
% Return a pseudo random integer array R (m x k)
% such that
% sum(R,2) == N
% lo <= R <= hi
lo = ceil(max(lo,0));
s = round(s);
N = s-k*lo;
assert(N >= 0, 'lo too large')
hi = floor(min(hi,N));
r = hi-lo;
assert(r >= 0, 'lo too large or hi too small')
R = zeros(m, k);
for i=1:m
R(i,:) = ricb_helper(k, N, r);
end
R = R + lo;
end % ricb
function v = ricb_helper(k, N, r)
% function v = ricb(k, N, r)
% INPUT
% k: length of the vector
% N: desired sum
% r: upper bounds
% head: optional parameter to by concatenate in the first column
% of the output
% OUTPUT:
% v: (1 x k) array such as sum(v,2)==N
% Algorithm:
% Recursive
if k==1
if r >= N
v = N;
else
v = zeros(0,1,class(N));
end
else
J = (0:min(r,N))';
c = arrayfun(@(j) icbcount(k-1, N-j, r), J, 'unif', 1);
keep = c > 0;
J = J(keep);
c = c(keep);
if ~isempty(J)
cc = cumsum(c);
cc = [0; cc/cc(end)];
ir = discretize(rand(), cc);
j = J(ir);
% recursive call
v = [j, ricb_helper(k-1, N-j, r)];
else
v = zeros(0,k,class(N));
end
end
end % ricb
%%
function c = icbcount(k, N, r)
% count the number of solutions v1+...+vk = N
% v1,...vk are integers such that 0<=vi<=r
% (integer composition with common upper bounds)
q = ceil((N+k)/(r+1)):k;
sg = (-1).^(k-q);
C1 = arrayfun(@(q)nchoosek(k, q), q);
C2 = arrayfun(@(q)nchoosek(q.*(r + 1)-1-N, k-1), q);
c = sum(sg .* C1 .* C2);
end % M icbcount
2 Comments
John D'Errico
on 11 Aug 2023
@Bruno Luong - well done. I gave each of the solutions posted here a decent test in my answer. And yours seems to be as close to uniform as I would expect.
Bruno Luong
on 12 Aug 2023
Edited: Bruno Luong
on 14 Aug 2023
Here is a speed up version.
EDIT: a smaller change has been made makes it even faster than my fisrt attempt
lo = 100;
hi = 700;
starget = 1000;
k = 3;
m = 100;
tic
R = ricb(m, k, starget, lo, hi);
t = toc;
R
fprintf('Generate time = %1.3f second\n', t)
function R = ricb(m, k, s, lo, hi)
% R = ricb(m, k, s, lo, hi)
% pseudo random integer array with sum and bound constraints.
%
% INPUTS:
% m: scalar number of random vectors
% k: scalar length of the solutions
% s: scalar target sum
% lo: scalar lower bound
% hi: scalar upper bound
% OUTPUT:
% Return a pseudo random integer array R of size (m x k)
% such that
% sum(R,2) == s
% lo <= R <= hi
% Example:
%
% lo = 1;
% hi = 4;
% starget = 10;
% k = 5;
% m = 20;
% R = ricb(m, k, starget, lo, hi)
%
% See also: randfixedsum
lo = ceil(max(lo,0));
s = round(s);
k = round(k);
N = s-k*lo;
assert(N >= 0, 'LO too large')
hi = floor(min(hi,N));
r = hi-lo;
assert(r >= 0, 'LO too large or HI too small')
assert(k*r >= N, 'HI too small')
if k == 1
R = repelem(s, m, 1);
return
end
k = double(k);
N = double(N);
r = double(r);
R = zeros(m, k);
UR = rand(k,m);
for i=1:m % possible parfor but not necessary faster
% generate one vector at the time
R(i,:) = ricb_helper(k, N, r, i, UR);
end
if lo
R = R + lo;
end
end % ricb
%%
function v = ricb_helper(k, N, r, snum, UR)
% function v = ricb(k, N, r, UR)
% INPUT
% k: length of the vector
% N: desired sum
% r: upper bounds (lo bound is assumed to be 0)
% head: optional parameter to by concatenate in the first column
% of the output
% snum: solution number, in 1:m where m is size(UR,2)
% UR: array of size (k x m), contains uniform distribution in (0,1)
% interval. We use the pre-random array to avoid calling over and
% over rand() function and save calling overhead
% OUTPUT:
% v: (1 x k) array such as sum(v,2)==N
% Algorithm:
% Recursive
if k==1
v = N;
else
jmax = min(r,N);
q1 = ceil((N-jmax+k-1)/(r+1));
nq = k-q1;
% Compute C1 = arrayfun(@(q)nchoosek(k, q), q); with alternate sign
% q = q1:k;
C1 = zeros(nq, 1);
x = 1;
for i = 1:nq
C1(nq+1-i) = x; % store in reverse order
x = (x*(i-k)) / i;
end
% Compute C2 = nchoosek(q.*(r + 1)-1-N-J, k-2), q)
C2 = zeros(r+1, nq);
kappa = k-2;
eta = q1.*(r + 1)-1-N;
di = max(kappa-eta, 0);
eta = eta + di;
%y = nchoosek(eta, kappa); % q = k-1, j = 0;
y = 1; % no need to set y = nchoosek(eta, kappa) to have correct absolute count
% since it scales the count c which we don't need.
% Note C2 is still integer with this change
for i = 1+di:numel(C2)
C2(i) = y;
eta = eta + 1;
y = (y * eta) / (eta - kappa);
end
% Number of solutions (up to a factor of 1/nchoosek(q1.*(r + 1)-1+di, k-2))
% counted for each j
c = C2*C1;
if jmax < r
c = c(1:jmax+1,:);
end
% Pick the first element according to the respective
% number of solutions count
cc = cumsum(c);
cc = [0; cc/cc(end)];
j = discretize(UR(k, snum), cc)-1;
% recursive call
v = [j, ricb_helper(k-1, N-j, r, snum, UR)];
end
end % ricb
John D'Errico
on 11 Aug 2023
I had to put in my 2 cents here. I saw @Matt J and @Bruno Luong both claim uniformity, or at least close to approximate uniformity. And that made me wonder, what does a uniform sampling mean in this context? And how would I test for uniformity? Finally, since this is effectively a comment on several answers in this post, I've made my post an answer in itself, rathrer than a comment on any one of these eminently interesting solutions.
First, what does uniform mean? Since there are only a finite number of possible solutions, uniform MUST mean that every solution is as likely to arise as any other. So if we sample a large number of times, we should get replicates. But all of them should be at least close to uniformity.
Looking at Matt's solution. It seems entirely plausible. It does work. But, is it uniform? A good test seemed to be to find a sample of 5 integers, each of which comes from the set [0,1,2], where the sum is 5. This is a good stress test. There are not too many possible solutions (51 such possible solutions it would appear.) I downloaded Matt's very nice code minL1intlin. (Well worth having in your toolbox anyway.)
Here is my test function:
function X = randfixedsumint(nvars,nsets,setsum,lb,ub)
opts =optimoptions('intlinprog');
opts.Display = 'none';
for n = 1:nsets
x0=randfixedsum(nvars,1,setsum,lb,ub);
e=ones(1,nvars);
X(:,n) = round( minL1intlin( speye(nvars), x0, 1:nvars, [],[],e,setsum,lb*e,ub*e,[],opts) );
end
Now for the test.
X = randfixedsumint(5,10000,5,0,2)';
Not terribly fast. But then what can you expect? It is doing a call to intlinprog for each set.
Before I go any further, what does uniform mean?
[Xu,I,J] = unique(X,'rows');
size(Xu)
ans =
51 5
So we saw 51 unique events, in a sample of size 10000. We would expect that roughly each possible event will arise almost 200 times in that sample. 196.08 times, but 200 is a good round number to shoot for.
10000/51
ans =
196.08
How well did this idea work?
[Xu,accumarray(J,1,[51,1],@sum)]
ans =
0 0 1 2 2 94
0 0 2 1 2 118
0 0 2 2 1 91
0 1 0 2 2 76
0 1 1 1 2 335
0 1 1 2 1 313
0 1 2 0 2 91
0 1 2 1 1 291
0 1 2 2 0 103
0 2 0 1 2 98
0 2 0 2 1 88
0 2 1 0 2 80
0 2 1 1 1 299
0 2 1 2 0 85
0 2 2 0 1 83
0 2 2 1 0 108
1 0 0 2 2 88
1 0 1 1 2 315
1 0 1 2 1 323
1 0 2 0 2 90
1 0 2 1 1 300
1 0 2 2 0 93
1 1 0 1 2 288
1 1 0 2 1 274
1 1 1 0 2 331
1 1 1 1 1 1041
1 1 1 2 0 327
1 1 2 0 1 288
1 1 2 1 0 334
1 2 0 0 2 89
1 2 0 1 1 291
1 2 0 2 0 111
1 2 1 0 1 277
1 2 1 1 0 315
1 2 2 0 0 86
2 0 0 1 2 105
2 0 0 2 1 88
2 0 1 0 2 105
2 0 1 1 1 305
2 0 1 2 0 92
2 0 2 0 1 93
2 0 2 1 0 102
2 1 0 0 2 82
2 1 0 1 1 323
2 1 0 2 0 100
2 1 1 0 1 299
2 1 1 1 0 305
2 1 2 0 0 91
2 2 0 0 1 94
2 2 0 1 0 100
2 2 1 0 0 102
Drat. The [1 1 1 1 1] event is roughly 10 times as likely as many of the other events. So not really that uniform. Again, this was a stress test. It is designed to be a highly difficult case.
How about randisum? This is the entry from @DGM. I think the corresponding call to randisum would be:
randisum([NaN,2],5,[1,5],'uniform')
ans =
2 0 1 1 1
Now for a test of uniformity.
nsets = 10000;
X = zeros(nsets,5);
for i = 1:nsets
X(i,:) = randisum([NaN,2],5,[1,5],'uniform');
end
[Xu2,I,J] = unique(X,'rows');
size(Xu2)
ans =
51 5
[Xu2,accumarray(J,1,[51,1],@sum)]
ans =
0 0 1 2 2 160
0 0 2 1 2 141
0 0 2 2 1 163
0 1 0 2 2 156
0 1 1 1 2 242
0 1 1 2 1 250
0 1 2 0 2 129
0 1 2 1 1 245
0 1 2 2 0 171
0 2 0 1 2 172
0 2 0 2 1 170
0 2 1 0 2 138
0 2 1 1 1 256
0 2 1 2 0 146
0 2 2 0 1 155
0 2 2 1 0 151
1 0 0 2 2 130
1 0 1 1 2 263
1 0 1 2 1 210
1 0 2 0 2 156
1 0 2 1 1 237
1 0 2 2 0 153
1 1 0 1 2 230
1 1 0 2 1 286
1 1 1 0 2 250
1 1 1 1 1 441
1 1 1 2 0 253
1 1 2 0 1 240
1 1 2 1 0 256
1 2 0 0 2 151
1 2 0 1 1 266
1 2 0 2 0 157
1 2 1 0 1 262
1 2 1 1 0 251
1 2 2 0 0 157
2 0 0 1 2 133
2 0 0 2 1 178
2 0 1 0 2 152
2 0 1 1 1 268
2 0 1 2 0 157
2 0 2 0 1 160
2 0 2 1 0 144
2 1 0 0 2 153
2 1 0 1 1 259
2 1 0 2 0 138
2 1 1 0 1 212
2 1 1 1 0 231
2 1 2 0 0 131
2 2 0 0 1 163
2 2 0 1 0 179
2 2 1 0 0 148
That is a little better, but still not as good as I might have hoped to see. In a sample of size 10000, we would expect to see a distributino that lies a little more tightly around the mean.
int51 = randi(51,[10000,1]);
histogram(int51)
Do you see the problem in what we see from randisum? It is not too bad looking, BUT the [1 1 1 1 1] event happened 441 times, more than twice as often as I would expect. Even so, not bad, and far better than Matt's solution.
Finally, takin a look at the solution from Bruno, though I had to fix a few lines, and repair the end statements, it did work in the end. This should be the comparable call:
R = ricb(10000,5,5,0,2);
[Ru,I,J] = unique(R,'rows');
[Ru,accumarray(J,1,[51,1],@sum)]
ans =
0 0 1 2 2 198
0 0 2 1 2 224
0 0 2 2 1 202
0 1 0 2 2 201
0 1 1 1 2 171
0 1 1 2 1 212
0 1 2 0 2 186
0 1 2 1 1 195
0 1 2 2 0 201
0 2 0 1 2 212
0 2 0 2 1 202
0 2 1 0 2 191
0 2 1 1 1 191
0 2 1 2 0 190
0 2 2 0 1 212
0 2 2 1 0 214
1 0 0 2 2 192
1 0 1 1 2 183
1 0 1 2 1 199
1 0 2 0 2 198
1 0 2 1 1 214
1 0 2 2 0 181
1 1 0 1 2 186
1 1 0 2 1 172
1 1 1 0 2 199
1 1 1 1 1 191
1 1 1 2 0 199
1 1 2 0 1 195
1 1 2 1 0 200
1 2 0 0 2 174
1 2 0 1 1 193
1 2 0 2 0 206
1 2 1 0 1 193
1 2 1 1 0 211
1 2 2 0 0 206
2 0 0 1 2 199
2 0 0 2 1 208
2 0 1 0 2 216
2 0 1 1 1 213
2 0 1 2 0 206
2 0 2 0 1 174
2 0 2 1 0 190
2 1 0 0 2 190
2 1 0 1 1 179
2 1 0 2 0 187
2 1 1 0 1 190
2 1 1 1 0 199
2 1 2 0 0 165
2 2 0 0 1 204
2 2 0 1 0 189
2 2 1 0 0 197
As you can see, these counts are much closer to what I would expect from a true uniform sampling. Is ricb truly fully uniform? That is a good question. I'd need to look very carefully at the code, and then test the hell out of it. But I think @Bruno Luong came pretty close to nailing it. Note that the [1 1 1 1 1] case does not look any different from the rest.
11 Comments
Bruno Luong
on 22 Aug 2023
Edited: Bruno Luong
on 22 Aug 2023
@Paul Beside the eye, classical testing is chi-square test
% Test uniformity
nsets = 100000;
X = ricb(nsets,5,5,0,2);
[Xu2,~,J] = unique(X,'rows');
c_BLU = accumarray(J,1,[51,1]);
subplot(1,1,1)
bar(c_BLU)
xlabel('solution [1-51]')
ylabel('count (freq)')
title('ricb')
% Chi-2 test
n = size(Xu2,1); % 51
meancount = nsets/n;
chi2 = sum((c_BLU-meancount).^2./meancount)
dof = n-1;
alpha = 0.05;
p = 1 - alpha;
chicv = chi2inv(p, dof)
if chi2 < chicv
fprintf('Uniformity test passed\n');
else
fprintf('Uniformity test failed\n');
end
Bruno Luong
on 10 Feb 2021
Edited: Bruno Luong
on 10 Feb 2021
Unfortunately the uniform distribution with bounds for integer is much more challenging. One way is to approximate by rounding the results
lo = 100;
hi = 700;
s = 3000;
n = 7;
if n*ceil(lo) > s || s*floor(hi) < s
error('Not possible')
end
while true
x = randfixedsum(n,1,s,lo,hi);
c = round([0; cumsum(x)]); c(end)=s;
xi = diff(c);
if all(xi >= lo & xi <= hi)
break
end
end
xi
0 Comments
Paul
on 19 Sep 2024
Edited: Paul
on 20 Sep 2024
How about a simple loop that finds all combinations of the random variables that satisfy the constraint and then sample from that set uniformly as suggested above.
rng(100);
X = sumintuni2(1e5,0:2,5,5);
figure
bar(groupcounts(X))
function X = sumintuni2(N,uval,usum,ny)
nelements = numel(uval);
nvectors = ny;
ucombinations = zeros(1000,ny);
ucount = 0;
term1 = nelements.^(nvectors:-1:1);
term2 = nelements.^((nvectors-1):-1:0);
for ii = 0:(nelements^nvectors - 1)
index = floor(mod(ii,term1)./term2) + 1;
if sum(uval(index)) == usum
ucount = ucount + 1;
ucombinations(ucount,:) = uval(index);
if ucount == height(ucombinations)
ucombinations = [ucombinations;zeros(1000,ny)];
end
end
end
ucombinations(ucount+1:end,:) = [];
X = ucombinations(randi(height(ucombinations),N,1),:);
end
1 Comment
Bruno Luong
on 20 Sep 2024
Edited: Bruno Luong
on 20 Sep 2024
@Paul, try to run this example with lower bound = 100, upper bound = 700 and target sum = 3000, ny = 7. I'm affraid enumerate all combonation is not a good strategy. This method would not scale up well for long uval and large ny.
See Also
Categories
Find more on Descriptive Statistics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!