If it makes any difference, I think this is a special case of a Markov chain where the transition probability matrix is:

[p(1) p(2) p(np);

1 0 0;

1 0 0]

1 view (last 30 days)

Show older comments

I'm looking for a better way to compute the possible sequences of a random variable whos value at time k is given by.

x(k) = 1 with probability p(1), 2 with probability p(2), ... np with probability p(np).

However, since the number of possible sequences increases exponentially with n and np, I only want to compute the most probable sequences which have a combined probability less than a threshold p_cut.

I've devised a function to do this below but it has two major drawbacks.

- It doesn't generalize to sequences longer than 3.
- It is quite memory inefficient since it always computes all possible permutations when only a subset may be needed.

function S = most_probable_sequences(p, n, p_cut)

assert(sum(p) == 1);

np = numel(p);

switch n

case 1

all_perms = (1:np)';

case 2

[X,Y] = ndgrid(1:np,1:np);

all_perms = [X(:) Y(:)];

case 3

[X,Y,Z] = ndgrid(1:np,1:np,1:np);

all_perms = [X(:) Y(:) Z(:)];

otherwise

error("n > 3 not implemented")

end

probs = prod(reshape(p(all_perms), [], n), 2);

[probs_sorted, order] = sort(probs, 'descend');

most_prob = cumsum(probs_sorted) <= p_cut;

S = all_perms(order(most_prob), :);

end

Examples of correct output of this function:

>> S = most_probable_sequences([0.95 0.04 0.01], 1, 0.99)

S =

1

2

>> S = most_probable_sequences([0.01 0.04 0.95], 2, 0.99)

S =

3 3

3 2

2 3

3 1

Matt J
on 23 Jun 2021

Edited: Matt J
on 23 Jun 2021

It is still a question though, whether you need the sequences that differ only by ordering listed explicitly. You could save memory and computation time if instead of listing the solution as,

S =[ 3 3

3 2

2 3

3 1];

you instead listed the result in tabular form like,

C=cell2table({[3 3], 1; [3 2], 2; [3 1], 1},'VariableNames',{'Sequence','Permutations'})

Matt J
on 23 Jun 2021

Edited: Matt J
on 24 Jun 2021

p=[0.01 0.04 0.95];

S = most_probable_sequences(p, 2, 0.99)

function S=most_probable_sequences(p, n, p_cut)

assert( abs(sum(p)-1) <= 1e-12);

S = recursor(p, n, p_cut);

end

function S = recursor(p, n, p_cut,~)

if n==1

[p,S]=sort(p(:),'descend');

cut=cumsum(p)<=p_cut;

S=S(cut);

return

end

s=recursor(p, n-1, p_cut,[]);

s=unique(sort(s,2),'rows');

np=numel(p);

ns=size(s,1);

[I,J]=ndgrid(1:ns,1:np);

s=unique(sort([s(I(:),:),J(:)],2),'rows');

s=num2cell(s,2);

for i=1:numel(s), s{i}=uniqueperms( s{i} ); end

s=cell2mat(s);

sp=prod(p(s),2);

[Probs,is]=sort(sp(:),'descend');

cut=cumsum(Probs)<=p_cut;

S=s(is(cut),:);

end

Matt J
on 23 Jun 2021

I used this one,

but I suppose it doesn't matter, if they all do the same thing and are equally fast.

Matt J
on 24 Jun 2021

Edited: Matt J
on 24 Jun 2021

but it has two major drawbacks....It doesn't generalize to sequences longer than 3.

The first drawback at least is easy to fix.

function S = most_probable_sequences(p, n, p_cut)

assert( abs(sum(p)-1) <= 1e-12);

np = numel(p);

[all_perms{1:n}] = ndgrid(1:np);

all_perms=reshape( cat(n+1,all_perms{:}) , [],n);

probs = prod(reshape(p(all_perms), [], n), 2);

[probs_sorted, order] = sort(probs, 'descend');

most_prob = cumsum(probs_sorted) <= p_cut;

S = all_perms(order(most_prob), :);

end

This will be faster, but less memory efficient than my other answer.

Matt J
on 24 Jun 2021

Edited: Matt J
on 24 Jun 2021

Here is a far more efficient version of my original answer based on my recommendation above to avoid explicitly listing permutations of the same sequence. It requires no File Exchange downloads and, as demonstrated below, easily handles up to n=12 and np=10.

%Small example

S = most_probable_sequences([0.01 0.04 0.95], 2, 0.99),

%Large example

p=rand(1,10);p=p/sum(p);

tic

S = most_probable_sequences(p, 12, 0.99);

toc

function S=most_probable_sequences(p, n, p_cut)

assert( abs(sum(p)-1) <= 1e-12)

S = recursor(p(:), n, p_cut);

end

function S = recursor(p, n, p_cut)

if n==1

[p,S]=sort(p(:),'descend');

ncut=find(cumsum(p)<=p_cut,1,'last');

S=S(1:ncut);

S=table(S,ones(size(S)),'VariableNames',{'Sequences','NumPerms'});

return

end

s=recursor(p, n-1, p_cut);

s=s.Sequences;

np=numel(p);

ns=size(s,1);

[I,J]=ndgrid(1:ns,1:np);

s=unique( sort( [s(I(:),:), J(:)] ,2) ,'rows','stable'); clear I J

ns=size(s,1);

X=repmat((1:ns).',1,n);

K=histcounts2(X,s,1:ns+1,1:np+1);

F = cumprod([1 1 2:n]) ; % trick to get all needed factorials (note that all K < N)

w = F(n+1) ./ prod(F(K+1),2) ; % number of unique permutations

clear F K X

sProbs=prod(p(s),2); %single sequence probabilities

wProbs=w.*sProbs; %weighted probabilities

[wProbs,order]=sort(wProbs(:),'descend');

sProbs=sProbs(order);

c=cumsum(wProbs);

mcut=find( c<=p_cut,1,'last');

kcut=floor( (p_cut-c(mcut))/sProbs(mcut+1));

ncut=mcut+(kcut>0);

order=order(1:ncut);

S=s(order,:);

w=w(order);

if kcut>0, w(end)=kcut; end

S=table(S,w,'VariableNames',{'Sequences','NumPerms'});

end

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!