## Shuffle binary vector (n0 = n1) with constraints: number of consecutive repetitions

### Frederik Bergmann (view profile)

on 8 Jul 2019
Latest activity Commented on by Frederik Bergmann

on 10 Jul 2019

### Jan (view profile)

Hello fellow Matlab users!
I have been working on an experiment, in which two conditions have to appear an equal number of times while being (pseudo)randomized. This proved to be quite challenging, so if you can help I would be be very glad:
I have two possible conditions (0 or 1) in an X long vector that I want to permute. However, each condition can only appear Z times in a row and both conditions have to appear an equal number of times.
I tried to use some lines of code that I found on another post:
seq = str2double(seq);
seq = seq(randperm(numel(seq))); %initial shuffle
old_idx = unique(find(diff(seq)==0)); %find repeats
while ~isempty(old_idx) %continue until no repeats
new_idx = unique(setdiff([1:length(seq)],old_idx)); %find new spots
new_idx = new_idx((randi(length(new_idx),length(old_idx),1)))';
seq([new_idx; old_idx],:) = seq([old_idx;new_idx],:); %swap
old_idx = unique(find(diff(seq)==0)); %find repeats
end
Now... in theory it works!
But the shuffling process can take very long and in some cases it appears to be stuck.
The code was written for vectors of number sequences [1:9] which might not be ideal in the case of a dichotomous variable such as condition 1 vs 2, as there might be many more repetitions. Also it draws from the same index variable which might create more repetitions.
Often it is stuck because there are no counterparts to draw from: e.g. there are say four "zeros" in a row but no "ones" to swap with.
I tried some other strategies, where I just reshuffle the whole vector if some conditions are met. But this is not just awkward to write and read, it is als computationally slow and has the same problem of being stuck at one point.
The other mothod is doing it once by hand and then stick with that pseudorandomization, which in psychological experiments is more or less common practice. But I don't know the number of trials and items yet, so it would be nice to automatize this; and also because I will have to do this again in a different part of the experiment.
What you would do in this case?

Jan

### Jan (view profile)

on 8 Jul 2019
"That means I have to shuffle a [x, 1] vector with two possible conditions (1 and 2)."
Did you mention, what the conditions are? The constraint "maximum number of repetitions" is vague only. Please post some inputs, explain the constraints and show us some valid (or invalid) outputs.
Jan

### Jan (view profile)

on 9 Jul 2019
Thanks for the clarifications.
"each condition can only appear Z times in a row" - This is not equivalent to:
unique(find(diff(seq)==0))
is it?

on 9 Jul 2019
Edited by Jan

### Jan (view profile)

on 9 Jul 2019

A brute force approach:
function V = ShuffledVector
x = 100; % Number of elements, assumed to be even
Z = 5;
V = [true(1, x/2), false(1, x/2)]; % Half of elements is 1, rest is 0
found = false;
iter = 0;
while ~found
V = V(randperm(x, x));
[~, n] = RunLength(V);
found = all(n < Z);
iter = iter + 1;
end
end
function [b,n,idx] = RunLength(x)
% Or use the faster C-Mex function:
% https://www.mathworks.com/matlabcentral/fileexchange/41813-runlength
x = x(:);
d = [true; diff(x) ~= 0]; % TRUE if values change
b = x(d); % Elements without repetitions
k = find([d', true]); % Indices of changes
n = diff(k); % Number of repetitions
idx = k(1:length(k) - 1); % Reply indices of changes
end
This permutes the complete vector if more than Z neighboring elements are equal. There larger Z is, the more efficient is this method. For x = 100 and Z = 10 one or two iterations are very likely, but for Z=3 you might wait for hours.
.An improvement is shuffle only elements from the blocks of equal values:
... % same as above
found = false;
iter = 0;
V = V(randperm(x, x));
while ~found
% Find longest block of equal values:
[b, n, idx] = RunLength(V);
[maxN, maxInd] = max(n);
if maxN > Z
% One element of the largest block:
oldIdx = idx(maxInd) + randi(maxN) - 1;
% One element with a different value:
other = find(V ~= b(maxInd));
newIdx = other(randi(numel(other)));
% Swap values:
swap = V(oldIdx);
V(oldIdx) = V(newIdx);
V(newIdx) = swap;
else
found = true;
end
iter = iter + 1;
end
This "bombing technique" chooses one element of the largest block of equal data and swaps its values randomly with other elements of V. Per iteration of the while loop, only one of these blocks is processed. It is very likely, that maxN occurs multiple times in n, so it might be better to include another loop:
... % same as above
found = false;
maxIter = 1e9;
iter = 0;
V = V(randperm(x, x));
while ~found && iter < maxIter
% Find longest block of equal values:
[b, n, idx] = RunLength(V);
maxN = max(n);
if maxN > Z
for maxInd = find(n == maxN)
% One element of the largest block:
oldIdx = idx(maxInd) + randi(maxN) - 1;
% One element with a different value:
other = find(V ~= b(maxInd));
newIdx = other(randi(numel(other)));
% Swap values:
swap = V(oldIdx);
V(oldIdx) = V(newIdx);
V(newIdx) = swap;
end
else
found = true;
end
iter = iter + 1;
end
if ~found
error('Cannot find solution in %d iterations!', iter)
end
Here I've added a security limit for the number of iterations.
For x = 100, Z = 5, this needs an average of 6 iterations.

#### 1 Comment

Frederik Bergmann

### Frederik Bergmann (view profile)

on 10 Jul 2019
Thank you Jan!
The first permutation approach is similar to what I have tried and indeed for Z = 3 with only two conditions it takes a long while!
The second approach, your "bombing technique", is the key. I tried it and it works perfectly!