# Generate random integers that sums to a specific number within a specific range

139 views (last 30 days)

Show older comments

Yu Takahashi
on 10 Feb 2021

Edited: Bruno Luong
on 20 Sep 2024 at 12:48

Dear members

I am attempting to generate random integers that sums to a specific number within a specific range. Infact, Roger Stafford provided a similar function randfixedsumint, but lacking the ability for specifying the min and max value that each integers could go.

In other words, I am looking for a function that could randomly generate 3 integers that sums to 1000 within a gange of 100 <= x <= 700.

function R = randfixedsumint(m,n,S);

% This generates an m by n array R. Each row will sum to S, and

% all elements are all non-negative integers. The probabilities

% of each possible set of row elements are all equal.

% RAS - Mar. 4, 2017

if ceil(m)~=m|ceil(n)~=n|ceil(S)~=S|m<1|n<1|S<0

error('Improper arguments')

else

P = ones(S+1,n);

for in = n-1:-1:1

P(:,in) = cumsum(P(:,in+1));

end

R = zeros(m,n);

for im = 1:m

s = S;

for in = 1:n

R(im,in) = sum(P(s+1,in)*rand<=P(1:s,in));

s = s-R(im,in);

end

end

end

return

Ref_1

Quite similar to what randfixsum(Random Vectors with Fixed Sum) could achieve, but I am having trouble limiting the generated value as just integers

Ref_2

Conditional Random number generation

##### 2 Comments

KALYAN ACHARJYA
on 10 Feb 2021

Edited: KALYAN ACHARJYA
on 10 Feb 2021

randi function is not allow?

Is this way out?

data=0;

while sum(data)~=1000

data=randi([100,700],[1,3]);

end

### Accepted Answer

Matt J
on 10 Feb 2021

Edited: Matt J
on 11 Feb 2021

Quite similar to what randfixsum(Random Vectors with Fixed Sum) could achieve, but I am having trouble limiting the generated value as just integers

I think a good approximate solution would be to use randfixedsum to get an initial continuous-space randomization x0. Then, you can use minL1intlin (Download) to project x0 to the nearest (in L1-norm) integer solution x. I don't know if this will be perfectly uniformly distributed, but I think it will be close. Example:

n=7; %number of variables

s=3000; %constrained sum

lb=100; %lower bound

ub=700; %upper bound

x0=randfixedsum(n,1,s,lb,ub);

e=ones(1,n);

x=round( minL1intlin( speye(n), x0, 1:n, [],[],e,s,lb*e,ub*e) );

>> [x0,x]

ans =

337.2273 337.0000

238.0432 238.0000

623.6036 624.0000

416.6231 417.0000

581.9920 582.0000

649.9868 650.0000

152.5241 152.0000

>> sum(x)

ans =

3000

##### 0 Comments

### More Answers (6)

Matt J
on 10 Feb 2021

Edited: Matt J
on 10 Feb 2021

For such a small problem size, you can pre-generate a table of allowable combinations:

[x,y]=ndgrid(uint16(100:700));

z=1000-x(:)-y(:);

idx=100<=z & z<=700;

comboTable = [x(idx),y(idx),z(idx)];

N=size(comboTable,1);

Then, you can just select rows randomly from the table:

xyz=comboTable(randi(N,1),:)

DGM
on 17 Jun 2021

Edited: DGM
on 17 Jun 2021

I've been wasting time on trying to do something similar lately. I can't believe I missed this question. For my own purposes, I couldn't figure out how to get randfixedsum() to do the various things I wanted (within memory constraints), so I did it my own naive way. If you only need a handful of values, I doubt that my answer is of any use to you, but maybe someone else who finds this post will find it helpful.

Attached is the current version of said file. It will output an array of integers with specified size and sum, with lower/upper boundaries depending on mode. Values can be generated to fit uniform, skew, exponential, or gaussian distributions. For example, all of these arrays have the same size and sum (i.e. they have the same mean of 200)

% generate uniform random integers with a sum of 2000 and maximum of 50

a = randisum([NaN 50],2000,[5 10],'uniform')

[sum(a(:)) min(a(:)) max(a(:)) round(mean(a(:)))]

Note that this file is not thoroughly tested yet. Don't be surprised if there are bugs. The lazy sum adjustment method potentially imparts some distortion to the distributions, so don't use it if you can't bear that, or come up with something better. I make no claims that these are particularly efficient, robust, or statistically meaningful ways to solve the problem. My own tolerance for these flaws is potentially higher than yours.

I may update the file as I work on it. EDIT: updated to improve sum adjustment and 'exponential' mode speed.

##### 3 Comments

DGM
on 11 Aug 2023

John D'Errico
on 11 Aug 2023

Bruno Luong
on 11 Aug 2023

Edited: Bruno Luong
on 11 Aug 2023

Revisit this old thread here is the randfixedsum version but for integers.

It's a quick implementation and it is slow, but mathematically it should generate conditional uniform distribution.

lo = 100;

hi = 700;

starget = 1000;

k = 3;

m = 20;

R = ricb(m, k, starget, lo, hi)

function R = ricb(m, k, s, lo, hi)

% R = ricb(m, k, s, lo, hi)

% pseudo random integer integer array with sum and bound constraints.

% INPUTS:

% m: scalar number of solutions

% k: scalar length of the solutions

% s: scalar target sum

% lo: scalar kower bound

% hi: scalar upper bound

% OUTPUT

% Return a pseudo random integer array R (m x k)

% such that

% sum(R,2) == N

% lo <= R <= hi

lo = ceil(max(lo,0));

s = round(s);

N = s-k*lo;

assert(N >= 0, 'lo too large')

hi = floor(min(hi,N));

r = hi-lo;

assert(r >= 0, 'lo too large or hi too small')

R = zeros(m, k);

for i=1:m

R(i,:) = ricb_helper(k, N, r);

end

R = R + lo;

end % ricb

function v = ricb_helper(k, N, r)

% function v = ricb(k, N, r)

% INPUT

% k: length of the vector

% N: desired sum

% r: upper bounds

% head: optional parameter to by concatenate in the first column

% of the output

% OUTPUT:

% v: (1 x k) array such as sum(v,2)==N

% Algorithm:

% Recursive

if k==1

if r >= N

v = N;

else

v = zeros(0,1,class(N));

end

else

J = (0:min(r,N))';

c = arrayfun(@(j) icbcount(k-1, N-j, r), J, 'unif', 1);

keep = c > 0;

J = J(keep);

c = c(keep);

if ~isempty(J)

cc = cumsum(c);

cc = [0; cc/cc(end)];

ir = discretize(rand(), cc);

j = J(ir);

% recursive call

v = [j, ricb_helper(k-1, N-j, r)];

else

v = zeros(0,k,class(N));

end

end

end % ricb

%%

function c = icbcount(k, N, r)

% count the number of solutions v1+...+vk = N

% v1,...vk are integers such that 0<=vi<=r

% (integer composition with common upper bounds)

q = ceil((N+k)/(r+1)):k;

sg = (-1).^(k-q);

C1 = arrayfun(@(q)nchoosek(k, q), q);

C2 = arrayfun(@(q)nchoosek(q.*(r + 1)-1-N, k-1), q);

c = sum(sg .* C1 .* C2);

end % M icbcount

##### 2 Comments

John D'Errico
on 11 Aug 2023

Bruno Luong
on 12 Aug 2023

Edited: Bruno Luong
on 14 Aug 2023

Here is a speed up version.

EDIT: a smaller change has been made makes it even faster than my fisrt attempt

lo = 100;

hi = 700;

starget = 1000;

k = 3;

m = 100;

tic

R = ricb(m, k, starget, lo, hi);

t = toc;

R

fprintf('Generate time = %1.3f second\n', t)

function R = ricb(m, k, s, lo, hi)

% R = ricb(m, k, s, lo, hi)

% pseudo random integer array with sum and bound constraints.

%

% INPUTS:

% m: scalar number of random vectors

% k: scalar length of the solutions

% s: scalar target sum

% lo: scalar lower bound

% hi: scalar upper bound

% OUTPUT:

% Return a pseudo random integer array R of size (m x k)

% such that

% sum(R,2) == s

% lo <= R <= hi

% Example:

%

% lo = 1;

% hi = 4;

% starget = 10;

% k = 5;

% m = 20;

% R = ricb(m, k, starget, lo, hi)

%

% See also: randfixedsum

lo = ceil(max(lo,0));

s = round(s);

k = round(k);

N = s-k*lo;

assert(N >= 0, 'LO too large')

hi = floor(min(hi,N));

r = hi-lo;

assert(r >= 0, 'LO too large or HI too small')

assert(k*r >= N, 'HI too small')

if k == 1

R = repelem(s, m, 1);

return

end

k = double(k);

N = double(N);

r = double(r);

R = zeros(m, k);

UR = rand(k,m);

for i=1:m % possible parfor but not necessary faster

% generate one vector at the time

R(i,:) = ricb_helper(k, N, r, i, UR);

end

if lo

R = R + lo;

end

end % ricb

%%

function v = ricb_helper(k, N, r, snum, UR)

% function v = ricb(k, N, r, UR)

% INPUT

% k: length of the vector

% N: desired sum

% r: upper bounds (lo bound is assumed to be 0)

% head: optional parameter to by concatenate in the first column

% of the output

% snum: solution number, in 1:m where m is size(UR,2)

% UR: array of size (k x m), contains uniform distribution in (0,1)

% interval. We use the pre-random array to avoid calling over and

% over rand() function and save calling overhead

% OUTPUT:

% v: (1 x k) array such as sum(v,2)==N

% Algorithm:

% Recursive

if k==1

v = N;

else

jmax = min(r,N);

q1 = ceil((N-jmax+k-1)/(r+1));

nq = k-q1;

% Compute C1 = arrayfun(@(q)nchoosek(k, q), q); with alternate sign

% q = q1:k;

C1 = zeros(nq, 1);

x = 1;

for i = 1:nq

C1(nq+1-i) = x; % store in reverse order

x = (x*(i-k)) / i;

end

% Compute C2 = nchoosek(q.*(r + 1)-1-N-J, k-2), q)

C2 = zeros(r+1, nq);

kappa = k-2;

eta = q1.*(r + 1)-1-N;

di = max(kappa-eta, 0);

eta = eta + di;

%y = nchoosek(eta, kappa); % q = k-1, j = 0;

y = 1; % no need to set y = nchoosek(eta, kappa) to have correct absolute count

% since it scales the count c which we don't need.

% Note C2 is still integer with this change

for i = 1+di:numel(C2)

C2(i) = y;

eta = eta + 1;

y = (y * eta) / (eta - kappa);

end

% Number of solutions (up to a factor of 1/nchoosek(q1.*(r + 1)-1+di, k-2))

% counted for each j

c = C2*C1;

if jmax < r

c = c(1:jmax+1,:);

end

% Pick the first element according to the respective

% number of solutions count

cc = cumsum(c);

cc = [0; cc/cc(end)];

j = discretize(UR(k, snum), cc)-1;

% recursive call

v = [j, ricb_helper(k-1, N-j, r, snum, UR)];

end

end % ricb

John D'Errico
on 11 Aug 2023

I had to put in my 2 cents here. I saw @Matt J and @Bruno Luong both claim uniformity, or at least close to approximate uniformity. And that made me wonder, what does a uniform sampling mean in this context? And how would I test for uniformity? Finally, since this is effectively a comment on several answers in this post, I've made my post an answer in itself, rathrer than a comment on any one of these eminently interesting solutions.

First, what does uniform mean? Since there are only a finite number of possible solutions, uniform MUST mean that every solution is as likely to arise as any other. So if we sample a large number of times, we should get replicates. But all of them should be at least close to uniformity.

Looking at Matt's solution. It seems entirely plausible. It does work. But, is it uniform? A good test seemed to be to find a sample of 5 integers, each of which comes from the set [0,1,2], where the sum is 5. This is a good stress test. There are not too many possible solutions (51 such possible solutions it would appear.) I downloaded Matt's very nice code minL1intlin. (Well worth having in your toolbox anyway.)

Here is my test function:

function X = randfixedsumint(nvars,nsets,setsum,lb,ub)

opts =optimoptions('intlinprog');

opts.Display = 'none';

for n = 1:nsets

x0=randfixedsum(nvars,1,setsum,lb,ub);

e=ones(1,nvars);

X(:,n) = round( minL1intlin( speye(nvars), x0, 1:nvars, [],[],e,setsum,lb*e,ub*e,[],opts) );

end

Now for the test.

X = randfixedsumint(5,10000,5,0,2)';

Not terribly fast. But then what can you expect? It is doing a call to intlinprog for each set.

Before I go any further, what does uniform mean?

[Xu,I,J] = unique(X,'rows');

size(Xu)

ans =

51 5

So we saw 51 unique events, in a sample of size 10000. We would expect that roughly each possible event will arise almost 200 times in that sample. 196.08 times, but 200 is a good round number to shoot for.

10000/51

ans =

196.08

How well did this idea work?

[Xu,accumarray(J,1,[51,1],@sum)]

ans =

0 0 1 2 2 94

0 0 2 1 2 118

0 0 2 2 1 91

0 1 0 2 2 76

0 1 1 1 2 335

0 1 1 2 1 313

0 1 2 0 2 91

0 1 2 1 1 291

0 1 2 2 0 103

0 2 0 1 2 98

0 2 0 2 1 88

0 2 1 0 2 80

0 2 1 1 1 299

0 2 1 2 0 85

0 2 2 0 1 83

0 2 2 1 0 108

1 0 0 2 2 88

1 0 1 1 2 315

1 0 1 2 1 323

1 0 2 0 2 90

1 0 2 1 1 300

1 0 2 2 0 93

1 1 0 1 2 288

1 1 0 2 1 274

1 1 1 0 2 331

1 1 1 1 1 1041

1 1 1 2 0 327

1 1 2 0 1 288

1 1 2 1 0 334

1 2 0 0 2 89

1 2 0 1 1 291

1 2 0 2 0 111

1 2 1 0 1 277

1 2 1 1 0 315

1 2 2 0 0 86

2 0 0 1 2 105

2 0 0 2 1 88

2 0 1 0 2 105

2 0 1 1 1 305

2 0 1 2 0 92

2 0 2 0 1 93

2 0 2 1 0 102

2 1 0 0 2 82

2 1 0 1 1 323

2 1 0 2 0 100

2 1 1 0 1 299

2 1 1 1 0 305

2 1 2 0 0 91

2 2 0 0 1 94

2 2 0 1 0 100

2 2 1 0 0 102

Drat. The [1 1 1 1 1] event is roughly 10 times as likely as many of the other events. So not really that uniform. Again, this was a stress test. It is designed to be a highly difficult case.

How about randisum? This is the entry from @DGM. I think the corresponding call to randisum would be:

randisum([NaN,2],5,[1,5],'uniform')

ans =

2 0 1 1 1

Now for a test of uniformity.

nsets = 10000;

X = zeros(nsets,5);

for i = 1:nsets

X(i,:) = randisum([NaN,2],5,[1,5],'uniform');

end

[Xu2,I,J] = unique(X,'rows');

size(Xu2)

ans =

51 5

[Xu2,accumarray(J,1,[51,1],@sum)]

ans =

0 0 1 2 2 160

0 0 2 1 2 141

0 0 2 2 1 163

0 1 0 2 2 156

0 1 1 1 2 242

0 1 1 2 1 250

0 1 2 0 2 129

0 1 2 1 1 245

0 1 2 2 0 171

0 2 0 1 2 172

0 2 0 2 1 170

0 2 1 0 2 138

0 2 1 1 1 256

0 2 1 2 0 146

0 2 2 0 1 155

0 2 2 1 0 151

1 0 0 2 2 130

1 0 1 1 2 263

1 0 1 2 1 210

1 0 2 0 2 156

1 0 2 1 1 237

1 0 2 2 0 153

1 1 0 1 2 230

1 1 0 2 1 286

1 1 1 0 2 250

1 1 1 1 1 441

1 1 1 2 0 253

1 1 2 0 1 240

1 1 2 1 0 256

1 2 0 0 2 151

1 2 0 1 1 266

1 2 0 2 0 157

1 2 1 0 1 262

1 2 1 1 0 251

1 2 2 0 0 157

2 0 0 1 2 133

2 0 0 2 1 178

2 0 1 0 2 152

2 0 1 1 1 268

2 0 1 2 0 157

2 0 2 0 1 160

2 0 2 1 0 144

2 1 0 0 2 153

2 1 0 1 1 259

2 1 0 2 0 138

2 1 1 0 1 212

2 1 1 1 0 231

2 1 2 0 0 131

2 2 0 0 1 163

2 2 0 1 0 179

2 2 1 0 0 148

That is a little better, but still not as good as I might have hoped to see. In a sample of size 10000, we would expect to see a distributino that lies a little more tightly around the mean.

int51 = randi(51,[10000,1]);

histogram(int51)

Do you see the problem in what we see from randisum? It is not too bad looking, BUT the [1 1 1 1 1] event happened 441 times, more than twice as often as I would expect. Even so, not bad, and far better than Matt's solution.

Finally, takin a look at the solution from Bruno, though I had to fix a few lines, and repair the end statements, it did work in the end. This should be the comparable call:

R = ricb(10000,5,5,0,2);

[Ru,I,J] = unique(R,'rows');

[Ru,accumarray(J,1,[51,1],@sum)]

ans =

0 0 1 2 2 198

0 0 2 1 2 224

0 0 2 2 1 202

0 1 0 2 2 201

0 1 1 1 2 171

0 1 1 2 1 212

0 1 2 0 2 186

0 1 2 1 1 195

0 1 2 2 0 201

0 2 0 1 2 212

0 2 0 2 1 202

0 2 1 0 2 191

0 2 1 1 1 191

0 2 1 2 0 190

0 2 2 0 1 212

0 2 2 1 0 214

1 0 0 2 2 192

1 0 1 1 2 183

1 0 1 2 1 199

1 0 2 0 2 198

1 0 2 1 1 214

1 0 2 2 0 181

1 1 0 1 2 186

1 1 0 2 1 172

1 1 1 0 2 199

1 1 1 1 1 191

1 1 1 2 0 199

1 1 2 0 1 195

1 1 2 1 0 200

1 2 0 0 2 174

1 2 0 1 1 193

1 2 0 2 0 206

1 2 1 0 1 193

1 2 1 1 0 211

1 2 2 0 0 206

2 0 0 1 2 199

2 0 0 2 1 208

2 0 1 0 2 216

2 0 1 1 1 213

2 0 1 2 0 206

2 0 2 0 1 174

2 0 2 1 0 190

2 1 0 0 2 190

2 1 0 1 1 179

2 1 0 2 0 187

2 1 1 0 1 190

2 1 1 1 0 199

2 1 2 0 0 165

2 2 0 0 1 204

2 2 0 1 0 189

2 2 1 0 0 197

As you can see, these counts are much closer to what I would expect from a true uniform sampling. Is ricb truly fully uniform? That is a good question. I'd need to look very carefully at the code, and then test the hell out of it. But I think @Bruno Luong came pretty close to nailing it. Note that the [1 1 1 1 1] case does not look any different from the rest.

##### 11 Comments

Bruno Luong
on 22 Aug 2023

Edited: Bruno Luong
on 22 Aug 2023

@Paul Beside the eye, classical testing is chi-square test

% Test uniformity

nsets = 100000;

X = ricb(nsets,5,5,0,2);

[Xu2,~,J] = unique(X,'rows');

c_BLU = accumarray(J,1,[51,1]);

subplot(1,1,1)

bar(c_BLU)

xlabel('solution [1-51]')

ylabel('count (freq)')

title('ricb')

% Chi-2 test

n = size(Xu2,1); % 51

meancount = nsets/n;

chi2 = sum((c_BLU-meancount).^2./meancount)

dof = n-1;

alpha = 0.05;

p = 1 - alpha;

chicv = chi2inv(p, dof)

if chi2 < chicv

fprintf('Uniformity test passed\n');

else

fprintf('Uniformity test failed\n');

end

Bruno Luong
on 10 Feb 2021

Edited: Bruno Luong
on 10 Feb 2021

Unfortunately the uniform distribution with bounds for integer is much more challenging. One way is to approximate by rounding the results

lo = 100;

hi = 700;

s = 3000;

n = 7;

if n*ceil(lo) > s || s*floor(hi) < s

error('Not possible')

end

while true

x = randfixedsum(n,1,s,lo,hi);

c = round([0; cumsum(x)]); c(end)=s;

xi = diff(c);

if all(xi >= lo & xi <= hi)

break

end

end

xi

##### 0 Comments

Paul
on 19 Sep 2024 at 23:40

Edited: Paul
on 20 Sep 2024 at 2:37

How about a simple loop that finds all combinations of the random variables that satisfy the constraint and then sample from that set uniformly as suggested above.

rng(100);

X = sumintuni2(1e5,0:2,5,5);

figure

bar(groupcounts(X))

function X = sumintuni2(N,uval,usum,ny)

nelements = numel(uval);

nvectors = ny;

ucombinations = zeros(1000,ny);

ucount = 0;

term1 = nelements.^(nvectors:-1:1);

term2 = nelements.^((nvectors-1):-1:0);

for ii = 0:(nelements^nvectors - 1)

index = floor(mod(ii,term1)./term2) + 1;

if sum(uval(index)) == usum

ucount = ucount + 1;

ucombinations(ucount,:) = uval(index);

if ucount == height(ucombinations)

ucombinations = [ucombinations;zeros(1000,ny)];

end

end

end

ucombinations(ucount+1:end,:) = [];

X = ucombinations(randi(height(ucombinations),N,1),:);

end

##### 1 Comment

Bruno Luong
on 20 Sep 2024 at 8:32

Edited: Bruno Luong
on 20 Sep 2024 at 12:48

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!