PARFOR loop is too slow
1 view (last 30 days)
Show older comments
Hi all, this is the code I wrote to simulate 8PSK in Rayleigh Channel. It is working and gives the correct results but it is too slow, compared to the case when I use an ordinary for-loop and run the code on only one core. Right now I run the "parfor-loop-code" on 4 cores. Is it because, I use too much memory for the all the {k}-indexed variables? How can I avoid this problem? TYVM
% 1 Transmit Antenna
% Nr Receive Antenna
% 3 bits per Transmission
clc;
clear all;
Nt = 1; % Tx antennas
Nr = 1; % Rx antennas
Ebit_N0_dB =0;
Es_N0_dec =1/Nr*3*10.^(Ebit_N0_dB/10); %Multiply by 3 because Symbol Energy is trice as much as Bit Energy for 3-Bit-Transmission
%divide by 1/Nr to normalize the received signal energy
%in other words every receiver should receive 1/Nr of the total energy received at the receiver side
BER = zeros(1,length(Ebit_N0_dB));
symbol_errors = zeros(1,length(Ebit_N0_dB));
bit_errors = zeros(1,length(Ebit_N0_dB));
% Defining the Gray Coded Constellation
demap =[0 0 0;0 0 1;0 1 0;0 1 1;1 0 0;1 0 1;1 1 0;1 1 1];
% 8PSK Symbols
S000=1;
S001=exp(1j*1*pi/4);
S010=exp(1j*3*pi/4);
S011=exp(1j*2*pi/4);
S100=exp(1j*7*pi/4);
S101=exp(1j*6*pi/4);
S110=exp(1j*4*pi/4);
S111=exp(1j*5*pi/4);
N=10^6;
symbols=0;
xjq=cell(1,N);
argmin=cell(1,N);
tic
parfor K=1:N
symbols=symbols+1
bin_ip{K} = round(rand(1,3)); % Generate Binary Input
%-------------------------------
if bin_ip{K}==[0 0 0]
xjq{K}=S000
elseif bin_ip{K}==[0 0 1]
xjq{K}=S001
elseif bin_ip{K}==[0 1 0]
xjq{K}=S010
elseif bin_ip{K}==[0 1 1]
xjq{K}=S011
elseif bin_ip{K}==[1 0 0]
xjq{K}=S100
elseif bin_ip{K}==[1 0 1]
xjq{K}=S101
elseif bin_ip{K}==[1 1 0]
xjq{K}=S110
elseif bin_ip{K}==[1 1 1]
xjq{K}=S111
end
n{K} = (1/sqrt(2))*(randn(Nr,1) + 1i*randn(Nr,1)); % white Gaussian Noise with variance of 1
%==================================================================
%Generate Channel Matrix
%==================================================================
H{K}=(1/sqrt(2)*( randn(Nr,Nt)+1i*randn(Nr,Nt)));
y{K} = sqrt(Es_N0_dec)*H{K}*xjq{K} + n{K}; %=================================================================
%Optimal Detection Scheme
=================================================================
p = sqrt(Es_N0_dec); %8PSK
y000{K}=H{K}*p*S000; argmin{K}(1)=norm(y{K}-y000{K});
y001{K}=H{K}*p*S001; argmin{K}(2)=norm(y{K}-y001{K});
y010{K}=H{K}*p*S010; argmin{K}(3)=norm(y{K}-y010{K});
y011{K}=H{K}*p*S011; argmin{K}(4)=norm(y{K}-y011{K});
y100{K}=H{K}*p*S100; argmin{K}(5)=norm(y{K}-y100{K});
y101{K}=H{K}*p*S101; argmin{K}(6)=norm(y{K}-y101{K});
y110{K}=H{K}*p*S110; argmin{K}(7)=norm(y{K}-y110{K});
y111{K}=H{K}*p*S111; argmin{K}(8)=norm(y{K}-y111{K});
[~,est(K)] = min(argmin{K});
est_bin{K} = demap(est(K),:);
%Count bit errors
bit_errors = bit_errors + length(find( est_bin{K}~=bin_ip{K}));
end
toc
BER = bit_errors/(symbols*3)
0 Comments
Answers (2)
Matt J
on 25 Jun 2013
Edited: Matt J
on 25 Jun 2013
You have a lot of on-the-fly memory allocation, which probably isn't helping things. H{k},y{k}, y000{k},etc... are not pre-allocated and get generated element-by-element in the loop. The contents of each argmin{K}, apparently a length 8 vector, are also not pre-allocated. Finally, it doesn't seem to make much sense holding these things in cell arrays. It looks like they all contain same-sized quantities and could be held instead in numeric arrays.
As another miscellaneous note, you can make things faster by skipping calls to find(),
bit_errors = bit_errors + sum( est_bin{K}~=bin_ip{K} );
3 Comments
Matt J
on 25 Jun 2013
sum(.) is definitely slower!
No way.
x=rand(1e8,1)>.5;
tic;
sum(x);
toc
%Elapsed time is 0.053452 seconds.
tic;
length(find(x));
toc
%Elapsed time is 0.904927 seconds.
Muthu Annamalai
on 25 Jun 2013
Edited: Muthu Annamalai
on 25 Jun 2013
Key to using parfor must be the independence of each iteration; i.e. you cannot write variable updates like,
a = a + 1
So change the lines,
bit_errors = bit_errors + length(find( est_bin{K}~=bin_ip{K}));
bit_errors(k) = length(find( est_bin{K}~=bin_ip{K}));
and rewrite
BER = bit_errors/(symbols*3)
as
BER = sum(bit_errors)/(symbols*3)
and remove the line,
symbols=symbols+1
because this variable is going to be N at end of all iterations.
This should give you some speedup.
2 Comments
Matt J
on 25 Jun 2013
you cannot write variable updates like a=a+1
Why not? Don't they qualify as legitimate reduction assignments as described here?
Muthu Annamalai
on 25 Jun 2013
@Matt I probably stand corrected in that case, while still my approach is explicitly parallelizable.
See Also
Categories
Find more on PSK in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!