Help creating an array with uniformly distributed random numbers (row-wise) comprised between 0 and 1, with each column having a sum of 1

Question

Simone A. on 14 Jul 2023

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/1996093-help-creating-an-array-with-uniformly-distributed-random-numbers-row-wise-comprised-between-0-and

Commented: Paul on 15 Jul 2023

Accepted Answer: Torsten

Open in MATLAB Online

Hi All,

I am trying to solve this problem, but I am not even sure this is possible.

I need to run a model 1e8+ times. To do this - making sure the model is unbiased - I need to obtain 4 random values comprised between 0 and 1 at each run, and their sum must equal 1 (these values will be area fractions).

So I have tried the following:

Random_Fraction_a = rand(1, 4); % Random Numbers
Random_Fraction_b = Random_Fraction_a ./ sum(Random_Fraction_a); % Random Numbers wich their sum = 1

However, as depicted in the figure below, 'rand' returns uniformly distributed random numbers, which is exactly what I am after, but when I normalise them between [0 1], the distribution changes completely. Now, it is my understanding that this is "statistically normal", but it is not what I need. Do you know, or can you think of any alternative way to obtain 4 random decimal values comprised between 0 and 1, wich their sum gives 1, and that are uniformly distributed row-wise?

Any help is grately appreciated!

See figure (and code) below:

% Example code
for i=1:1e5
    Random_Fraction_a = rand(1, 4);
    Random_Fraction_b = Random_Fraction_a ./ sum(Random_Fraction_a);
    Rand_Fa(:,i)=Random_Fraction_a;
    Rand_Fb(:,i)=Random_Fraction_b;
end
figure
for ii=1:4
    subplot(2,2,ii)
    histogram(Rand_Fa(ii,:))
    hold on
    histogram(Rand_Fb(ii,:),"FaceAlpha",0.5)
    legend('Rand Output','After Normalisation')
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Torsten on 14 Jul 2023

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1996093-help-creating-an-array-with-uniformly-distributed-random-numbers-row-wise-comprised-between-0-and#answer_1273233

Edited: Torsten on 14 Jul 2023

Open in MATLAB Online

https://uk.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum

You cannot expect the usual uniform distribution for the 4 rows of the matrix. This is impossible since each vector component poses conditions on the others in the corresponding column.

What you can expect is that columns of the matrix are uniformly distributed over the part of the hyperplane x1+x2+x3+x4 = 1 with xi >= 0.

I don't know if this serves your needs.

x = randfixedsum(4,1e5,1,0,1);

for ii=1:4

subplot(2,2,ii)

histogram(x(ii,:),"FaceAlpha",0.5)

legend('After Normalisation')

end

function [x,v] = randfixedsum(n,m,s,a,b)

% [x,v] = randfixedsum(n,m,s,a,b)

%

% This generates an n by m array x, each of whose m columns

% contains n random values lying in the interval [a,b], but

% subject to the condition that their sum be equal to s. The

% scalar value s must accordingly satisfy n*a <= s <= n*b. The

% distribution of values is uniform in the sense that it has the

% conditional probability distribution of a uniform distribution

% over the whole n-cube, given that the sum of the x's is s.

%

% The scalar v, if requested, returns with the total

% n-1 dimensional volume (content) of the subset satisfying

% this condition. Consequently if v, considered as a function

% of s and divided by sqrt(n), is integrated with respect to s

% from s = a to s = b, the result would necessarily be the

% n-dimensional volume of the whole cube, namely (b-a)^n.

%

% This algorithm does no "rejecting" on the sets of x's it

% obtains. It is designed to generate only those that satisfy all

% the above conditions and to do so with a uniform distribution.

% It accomplishes this by decomposing the space of all possible x

% sets (columns) into n-1 dimensional simplexes. (Line segments,

% triangles, and tetrahedra, are one-, two-, and three-dimensional

% examples of simplexes, respectively.) It makes use of three

% different sets of 'rand' variables, one to locate values

% uniformly within each type of simplex, another to randomly

% select representatives of each different type of simplex in

% proportion to their volume, and a third to perform random

% permutations to provide an even distribution of simplex choices

% among like types. For example, with n equal to 3 and s set at,

% say, 40% of the way from a towards b, there will be 2 different

% types of simplex, in this case triangles, each with its own

% area, and 6 different versions of each from permutations, for

% a total of 12 triangles, and these all fit together to form a

% particular planar non-regular hexagon in 3 dimensions, with v

% returned set equal to the hexagon's area.

%

% Roger Stafford - Jan. 19, 2006

% Check the arguments.

if (m~=round(m))|(n~=round(n))|(m<0)|(n<1)

error('n must be a whole number and m a non-negative integer.')

elseif (s<n*a)|(s>n*b)|(a>=b)

error('Inequalities n*a <= s <= n*b and a < b must hold.')

end

% Rescale to a unit cube: 0 <= x(i) <= 1

s = (s-n*a)/(b-a);

% Construct the transition probability table, t.

% t(i,j) will be utilized only in the region where j <= i + 1.

k = max(min(floor(s),n-1),0); % Must have 0 <= k <= n-1

s = max(min(s,k+1),k); % Must have k <= s <= k+1

s1 = s - [k:-1:k-n+1]; % s1 & s2 will never be negative

s2 = [k+n:-1:k+1] - s;

w = zeros(n,n+1); w(1,2) = realmax; % Scale for full 'double' range

t = zeros(n-1,n);

tiny = 2^(-1074); % The smallest positive matlab 'double' no.

for i = 2:n

tmp1 = w(i-1,2:i+1).*s1(1:i)/i;

tmp2 = w(i-1,1:i).*s2(n-i+1:n)/i;

w(i,2:i+1) = tmp1 + tmp2;

tmp3 = w(i,2:i+1) + tiny; % In case tmp1 & tmp2 are both 0,

tmp4 = (s2(n-i+1:n) > s1(1:i)); % then t is 0 on left & 1 on right

t(i-1,1:i) = (tmp2./tmp3).*tmp4 + (1-tmp1./tmp3).*(~tmp4);

end

% Derive the polytope volume v from the appropriate

% element in the bottom row of w.

v = n^(3/2)*(w(n,k+2)/realmax)*(b-a)^(n-1);

% Now compute the matrix x.

x = zeros(n,m);

if m == 0, return, end % If m is zero, quit with x = []

rt = rand(n-1,m); % For random selection of simplex type

rs = rand(n-1,m); % For random location within a simplex

s = repmat(s,1,m);

j = repmat(k+1,1,m); % For indexing in the t table

sm = zeros(1,m); pr = ones(1,m); % Start with sum zero & product 1

for i = n-1:-1:1 % Work backwards in the t table

e = (rt(n-i,:)<=t(i,j)); % Use rt to choose a transition

sx = rs(n-i,:).^(1/i); % Use rs to compute next simplex coord.

sm = sm + (1-sx).*pr.*s/(i+1); % Update sum

pr = sx.*pr; % Update product

x(n-i,:) = sm + pr.*e; % Calculate x using simplex coords.

s = s - e; j = j - e; % Transition adjustment

end

x(n,:) = sm + pr.*s; % Compute the last x

% Randomly permute the order in the columns of x and rescale.

rp = rand(n,m); % Use rp to carry out a matrix 'randperm'

[ig,p] = sort(rp); % The values placed in ig are ignored

x = (b-a)*x(p+repmat([0:n:n*(m-1)],n,1))+a; % Permute & rescale x

end

6 Comments
Show 4 older commentsHide 4 older comments

Simone A. on 15 Jul 2023

Thanks for your time and comprehensive explanation @Torsten. Really appreciated!

Paul on 15 Jul 2023

Open in MATLAB Online

This implemenation looks a little better if allowing some tolerance on the comparison to one. Or maybe use sym and then convert to double after elimination? And of course the V variables are in a nice ndgrid pattern, so an additional random step would be needed to select a column of x, or the columns would have to be scrambled before selecting sequentially. In the plots below I added the theoretical pdf on the histograms.

v1 = 0:0.01:1;

v2 = 0:0.01:1;

v3 = 0:0.01:1;

v4 = 0:0.01:1;

n = numel(v1);

[V1,V2,V3,V4] = ndgrid(v1,v2,v3,v4);

idx = find(abs(V1+V2+V3+V4-1) <= 1e-12);

V1 = V1(idx);

V2 = V2(idx);

V3 = V3(idx);

V4 = V4(idx);

x = [V1,V2,V3,V4].';

fz = @(z) 3*(z-1).^2; % theoretical pdf

for ii=1:4

subplot(2,2,ii)

histogram(x(ii,:),"FaceAlpha",0.5,'Normalization','pdf')

hold on

plot(0:.01:1,fz(0:.01:1),'LineWidth',2)

legend('After Normalisation')

end

Sign in to comment.

Answer 2

John D'Errico on 14 Jul 2023

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1996093-help-creating-an-array-with-uniformly-distributed-random-numbers-row-wise-comprised-between-0-and#answer_1273483

Open in MATLAB Online

randfixedsum.m

This is a mistake I see made so frequently, that is, a misunderstanding of what it means for a sample to be uniformly distributed, but with a sum constraint. The two ideas are sort of at odds, since a sample cannot be fully uniformly distributed the way we expect, once that constraint enters into the problem.

One thought is to start in 2-dimensions. So we have x1 and x2, uniformly distributed. but we want the sum to be 1. An answer is simple, we just choose x1 to be uniform on the interval [0,1], and then compute x2=1-x1. So this is easy. We can think of the result as choosing a sample uniformly along the straight line: x1+x2=1.

x1 = rand(1,500);

x2 = 1 - x1;

plot(x1,x2,'o')

You can create the desired array now, as

X = [x1;x2];

So the columns of X are uniformly distributed, under a sum constraint. Sadly, things get messier if we want to live in 3-dimensions. We cannot simply choose x1 and x2 independently now, because some of the time, x1+x2 will be greater than 1.

The randfixedsum code solves the problem elegantly, by sampling correctly so the set ois uniform, but it will lie in a TRIANGLE. Try it.

X = randfixedsum(3,1000,1,0,1);

plot3(X(1,:),X(2,:),X(3,:),'o')

box on

grid on

axis equal

view(48,28)

Now the points are seen to be uniformly distributed inside a TRIANGLE. However, you cannot now look at the marginal distrbution of any single variable, and hope them to also look uniform. This is your mistake. In fact, the histogram will now look like a trianglular distribution, with most of the samples near zero.

histogram(X(1,:),50)

Now, try the same thing in for 10 rows. I'll use a larger sample this time to make the histogram look pretty.

X = randfixedsum(10,100000,1,0,1);

size(X)

ans = 1×2

10 100000

histogram(X(1,:),50,'normalization','pdf')

Again, just because the sample is uniform in one respect, does NOT mean the marginal distributiuon should also be uniform. If we could plot the sample with a 10 dimensionsal monitor (sorry, I don't have one) then the plot would agsin be seen to be uniformly sampled inside a 10-dimensional simplex (an analogue of a triangle, but in 10-dimensions.)

You CANNOT have everything your intuition demands. Sorry, but too often a simple intuition runs counter to mathematical fact.

1 Comment
Show -1 older commentsHide -1 older comments

Simone A. on 15 Jul 2023

Hi @John D'Errico, thanks a lot for taking the time to explain the reationale behind it, that made everything much easier and clearer. Once again, thanks a lot!

Sign in to comment.

Help creating an array with uniformly distributed random numbers (row-wise) comprised between 0 and 1, with each column having a sum of 1

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

6 Comments
Show 4 older commentsHide 4 older comments

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Help creating an array with uniformly distributed random numbers (row-wise) comprised between 0 and 1, with each column having a sum of 1

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

6 Comments Show 4 older commentsHide 4 older comments

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

6 Comments
Show 4 older commentsHide 4 older comments

1 Comment
Show -1 older commentsHide -1 older comments