Randomly choose between one of two options

2 views (last 30 days)
antonio
antonio on 6 Jul 2012
Dear community,
I have a dataset consisting of 2 samples (1, 2) per case (A,B,..J). Therefore think of a vector: A1, A2, B1, B2, C1, C2,...J1, J2
I want to randomly choose either sample 1 or sample 2 for each of the cases. I can do that in a very inelegant way, however the problem is the following. In some cases, I got rid of one of the two samples. So for example, the vector might actually look like this: A2, B1, B2, C1, C2, D1, E1, E2...(here, A1 and D2 are missing).
As of now, I have given each of the samples a flag 1 or 2 randomly per case. And then I randomly choose to take only the 2's or only the 1's into account. However, this doesn't work for the cases that have only one sample since there is essentially a 50% of excluding the case.
I am sure there is a simple way of doing it that I'm completely overlooking. Thank you for your help!

Answers (3)

AC
AC on 6 Jul 2012
Hi,
I am not sure exactly what the structure of your data is. You may want to first transform your data into a 2 column matrix looking like:
M=[A1 A2; B1 B2 ; C1 C2];
If some samples are missing, you can put nan's :
M= [nan A2 ; B1 B2 ; C1 nan];
Say you have n samples, then generate a vector of uniform variables randomly drawn between 1 and 2:
u=unidrnd(2,n,1);
Then choose your samples while forcing some flags to be 1 or 2 if some data is missing:
selected_samples=zeros(size(M,1),1);
for i=1:size(M,1)
if sum(isnan(M(i,:))>0
selected_samples(i)=M(~isnan(M(i,:));
else
selected_samples(i)=M(i,u(i));
end
end
So that will work, even though there is probably some improvement possible (e.g. getting rid of the loop). Let me know if that's what you need!
Cheers,
AC

antonio
antonio on 9 Jul 2012
Hi AC,
Thanks for your help. Everything seems to work fine, except for when NaNs are involved. As soon as I have a NaN in vector M, the resulting selected_samples vector is not correct. For example, if I program it exactly as you proposed using M = [nan 2; 3 4 ; 5 nan], the resulting selected_samples vector is either [3 4 NaN] or [3 3 NaN]. I would expect [2 3 5] or [2 4 5]. Any ideas?
Thanks!

antonio
antonio on 9 Jul 2012
Hi AC,
I fixed the code by adding an extra line, perhaps making it less efficient, but it works. It now looks like this:
selected_samples=zeros(size(M,1),1);
for i=1:size(M,1)
if sum(isnan(M(i,:)))>0
x = find(~isnan(M(i,:)));
selected_samples(i)=M(i,x);
else
selected_samples(i)=M(i,u(i));
end
end
thanks!

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!