how to vectorise or speed up the specific code

Question

0 votes

I tried to make the variables single to decrease the calculation time.

It's a gravitational interaction calculation between particles.

How can I vectorise this code to avoid so many loops?

Can I do it since the particles could be more than 1 million and there should be built matrixes 1million x1million cells?

Most time spent on these lines

             d = [xj yj zj] - [xi yi zi]; % 3D distance as a vector
             dr2 = d(1)^2+d(2)^2+d(3)^2; %3D distance as absolute value
             AccelDouble = (d/sqrt(dr2))*((G*mj*(sign(mi)))/dr2); % acceleration increment
             ar = ar + single(AccelDouble);  % New acceleration

n = size(heavy,1);
AccelDouble = zeros(3,'double') ;
for i = 1:n 
    mi = heavy(i,1); % mass of first particle
    if mi ~= 0 % Avoid calculate 0 mass objects
    xi = heavy(i,2); % x position of first particle
    yi = heavy(i,3); % y position of first particle
    zi = heavy(i,4); % z position of first particle
    ar = [0 0 0]; %3D 
      for j = 1:n % second count for reacting particles
        if i ~= j % to avoid counting the same particles
          mj = heavy(j,1); % mass of second particle (interacting)
          if mj ~= 0 && mi ~= 0 % Avoid calculate 0 mass objects
              
             xj = heavy(j,2); % x position of second particle
             yj = heavy(j,3);
             zj = heavy(j,4);
            
             d = [xj yj zj] - [xi yi zi]; % 3D distance as a vector
             dr2 = d(1)^2+d(2)^2+d(3)^2; %3D distance as absolute value
             
             AccelDouble = (d/sqrt(dr2))*((G*mj*(sign(mi)))/dr2); % 3D acceleration increments
             ar = ar + single(AccelDouble);  % New 3D acceleration
          end            
        end
      end
    for k = 1:3
        heavy1(i,4+k) = heavy1(i,4+k) + ar(k)*timeStep;  % 3D New Speed calculation
        heavy1(i,k+1) = heavy1(i,k+1) + timeStep*heavy1(i,k+4); % 3D New posistion calculation
    end    
  end
end

6 Comments
Show 4 older comments Hide 4 older comments

Christos Dimopoulos on 19 Feb 2021

Open in MATLAB Online

The creation loop goes like this.

I have already tried also GPU (I am not an expert of course) but no luck up to now.

bodies = zeros(n,8); 
if n > 1
    for i = 1:n
        m = (rand*9+1)*m0;
        r = radiusOuter*(1.03-rand^1); %Distribution inside a sphere
        theta = rand*(2*pi);
        phi = acos(1-rand*2); 
        
        x = r*sin(phi)*cos(theta); %Spherical Distribution
        y = r*sin(phi)*sin(theta); 
        z = r*cos(phi); 
        
        dx =  1*sin(phi)*cos(theta); % Spherical outward velocity to implement explosion
        dy =  1*sin(phi)*sin(theta); 
        dz =  1*cos(phi); 
        
        v = r ; %initial radial outward velocity 
        bodies(i,1) = m;
        
        bodies(i,2) = x;
        bodies(i,3) = y;
        bodies(i,4) = z;
        
        bodies(i,5) = dx*v*(1+0.5*(0.5-rand));
        bodies(i,6) = dy*v*(1+0.5*(0.5-rand));
        bodies(i,7) = dz*v*(1+0.5*(0.5-rand));
        
        bodies(i,8) = 0; % for merging 
    end
end  

Walter Roberson on 26 Feb 2021

Can I do it since the particles could be more than 1 million and there should be built matrixes 1million x1million cells?

An array that was 1e6 x 1e6 would take 1e12 times the space for an individual element. For single precision that would be 4e12 bytes, which is slightly more than 3.6 terabytes, using 2^40 bytes as the definition of terabytes here.

... for each array.

Does your system have access to multiple terabytes of memory?

but this sounds like something a GPU should be able to do well.

The largest commercially available GPU is the NVIDIA Titan V, with 12 GB of memory. Considering the need for multiple matrices, you probably would not be able to handle much more than 25000 x 25000 single precision on such a system.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Bjorn Gustavsson on 19 Feb 2021

2 votes

Newton says you can cut it in two if you take into account that the force from particle i on particle j is equal but directed in opposite direction of force from particle j on particle i. Also if proper pre-allocation is tricky just run the loops from n down to 1.

Finally if this is some kind of explicit Euler scheme I suggest you take a look at Størmer-Verlet schemes instead (or perhaps the 12th-10th-order-runge-kutta-nystrom-integrator that might also be an option) that ought to be more suited for equations of motion.

1 Comment
Show -1 older comments Hide -1 older comments

Christos Dimopoulos on 26 Feb 2021

Thank you for your answer.

You are correct that the same force value (and opposite vector) applies to both the masses but I am not sure how to put it in the code, as it seems will I have to create in each iteration a huge NxN matrix, that has limits to 10^5 particles. If we found a way without any addition in calculations we could reduce the calculation time to half!

As far as the code scheme if I am not wrong the ones that you have suggested are ways to increase precission between two steps without making less and less steps of more and more large period. We are not there yet. Also we will try some more physics changes before we proceed to complicating the code.

Sign in to comment.

Answer 2

Jan on 19 Feb 2021

Edited: Jan on 19 Feb 2021

Open in MATLAB Online

1 vote

ar = [0 0 0];

This creates a double array.

ar = ar + single(AccelDouble);

Here AccelDouble is converted to a single at first, which must be converted back to a double, because it is added to a double. This is a waste of time.

Either convert all values to singles from the beginning. Or stay at doubles.

mi ~= 0 is excluded twice. If would be more efficient, to remove all m==0 before the loops.

"sign(mi)"? Do you mean "single(mi)"?

AccelDouble = zeros(3,'double') ;

This produces a [3 x 3] array. Because it is overwritten ineach iteration, this is not a pre-allocation, but a waste of time. It does not matter, that it is overwritten by a [1 x 3] array, but confusing only.

I assume, your code can be accelerated. What are the used sizes of the input?

The integration scheme based of delta(v) = delta(t)*a is very basic. There are integrations methods on a higher order, which allow to increase the step sizes, which reduces the run time.

[EDITED] Vectorizing the inner loop and using SINGLE for all data reduces the runtim from 1.64 sec to 0.06 sec for 2e3 objetcs:

function test
%                        Mass, position,      speed
heavy = rand(2e3, 7) .* [1e9,  1e6, 1e6, 1e6, 1e3, 1e3, 1e3];
tic
heavy = Core1(heavy);
toc
heavyS = single(heavy)
tic
heavy = Core1(heavyS);
toc
end
function heavy = Core3(heavy)
timeStep = 0.1;
G = 6.67430e-11;
M  = heavy(:, 1);
X  = heavy(:, 2:4);
V  = heavy(:, 5:7);
n  = numel(M);
MG = M * G;
for i = 1:n
   mi = M(i);
   
   D  = X - X(i, :);
   r2 = sum(D .* D, 2);
   F  = (D ./ sqrt(r2)) .* (MG * mi ./ r2);
   
   F(i, :) = 0;
   Ft      = sum(F, 1);
   
   V(i, :) = V(i, :) + Ft * timeStep;
   X(i, :) = X(i, :) + timeStep * V(i, :);
end
heavy(:, 2:4) = X;
heavy(:, 5:7) = V;
end
% Elapsed time is 1.880953 seconds.   Original code
% Elapsed time is 0.140433 seconds.   Vectorized, double
% Elapsed time is 0.066267 seconds.   Vectorized, single

I've implemented this with the assumption that your "sign(mi)" was thought as "single(mi)".

But note this: In the current implementation the problem is O(n^2). Doubling the input size does increase the runtime by a factor 4. I've tested the code with 2e3 rows. For 1e6 rows this will be more than 4 hours per time step.

How many timesteps do you want to calculate? I assume, there is no way to perform this efficiently in Matlab with this simple apporach. Even if you convert this to a C-mex method and increase the speed by a factor 2 (a pure guess only!), 2 hours per time step is still far away from allowing a serious number of time steps.

Bjorn has mentioned, that the force matrix is symmetrical, but it is not trivial to exploit this for 1e6 elements without exhausting the the memory. Using a sophisticated intergator is essential to reduce the number of timesteps for a certain accuracy.

Professional simulations use e.g. the leap-frog algorithm. This simplifies the computation by combining a neighborhood of elements to their center of mass: If we simulate the trajectories of the stars in the galaxy, the distribution of other stars in a spatial segment far away is in first approximation the force of its center of mass containing the sum of the masses. This can reduce the computing time by a factor 100 without loosing too much accruracy.

6 Comments
Show 4 older comments Hide 4 older comments

Christos Dimopoulos on 24 Feb 2021

Edited: Christos Dimopoulos on 25 Feb 2021

Open in MATLAB Online

Thanks you so much for your time.

It seems that you helped a lot. I will mention the unsolved ones of your remarks.

I am trying a rather rare hypothesis of negative mass, so that's why I needed sign(m).

In your equation you have changed some parts that I tried to fix but (not understanding why) I get different results.

In the specific line you have:

F = (D ./ sqrt(r2)) .* (MG * mi ./ r2);

and I made it to match my equation:

F = (D ./ r2) .* (MG * sign(M) ./ (r2 .^ 2));

You made it force with two masses but I wanted acceleration with just one mass and only the sign of the other.

I compare the two functions and I get different results (both in double presicion of course).

The two functions are:

function heavy1 = Core3(heavy,timeStep,GxSolarMass)
heavy1 = heavy ;
M  = heavy(:, 1);
X  = heavy(:, 2:4);
V  = heavy(:, 5:7);
n  = numel(M);
%MG = M * G;
for i = 1:n
   mi = M(i);
   MG = mi * G;
   
   D  = X - X(i, :);  
   r2 = sqrt(sum(D .* D, 2));
   F  = (D ./ r2) .* (MG * sign(M) ./ (r2 .^ 2));
   
   F(i, :) = 0;
   Ft      = sum(F, 1);
   
   V(i, :) = V(i, :) + Ft * timeStep;
   X(i, :) = X(i, :) + timeStep * V(i, :) ;
end
heavy1(:, 2:4) = X;
heavy1(:, 5:7) = V;
end

and:

function heavy1 = Initial(heavy,timeStep,G)
n = size(heavy,1);
heavy1 = heavy;
  for i = 1:n
    mi = heavy(i,1);
    if mi ~= 0
    xi = heavy(i,2);
    yi = heavy(i,3);
    zi = heavy(i,4);
    ar = [0 0 0];
      for j = 1:n
        mj = heavy(j,1);
        if i ~= j && mj ~= 0
             xj = heavy(j,2);
             yj = heavy(j,3);
             zj = heavy(j,4);
            
             d =  [xj yj zj] - [xi yi zi];
             dabs = sqrt(sum(d.^2));              
             ar = ar + (d/dabs)*G*mj*sign(mi)/(dabs^2);          
        end
      end      
      
      for k = 1:3
        heavy1(i,k+4) = heavy1(i,k+4) + ar(k)*timeStep;  
        heavy1(i,k+1) = heavy1(i,k+1) + timeStep*heavy1(i,k+4);
      end    
    end
  end   
end

Thanks again for your time and help.

Christos Dimopoulos on 26 Feb 2021

If we take into consideration negative mass (whatever that means) then we have between same kind attraction and between different kind repulsion. It gives relative but antisymmetric results than EM field.

But if I understand correctly, we are discussing now physics and not matlab code. I see that you are space physics professor, if you have more interest in the results it will be my pleasure to talk in another discussion.

Bjorn Gustavsson on 26 Feb 2021

Yes then you have what essentially boils down to identical equations as in PIC plasma-simulations. Have a look at what people do in that field.

Sign in to comment.

how to vectorise or speed up the specific code

6 Comments
Show 4 older comments Hide 4 older comments

Answers (2)

1 Comment
Show -1 older comments Hide -1 older comments

6 Comments
Show 4 older comments Hide 4 older comments

Categories

Tags

Community Treasure Hunt

how to vectorise or speed up the specific code

6 Comments Show 4 older comments Hide 4 older comments

Answers (2)

1 Comment Show -1 older comments Hide -1 older comments

6 Comments Show 4 older comments Hide 4 older comments

Categories

Tags

See Also

Community Treasure Hunt

6 Comments
Show 4 older comments Hide 4 older comments

1 Comment
Show -1 older comments Hide -1 older comments

6 Comments
Show 4 older comments Hide 4 older comments