Main Content

Generate SIMD Code for MATLAB Functions

You can generate SIMD (single instruction, multiple data) code from certain MATLAB functions by using Intel AVX and Intel SSE technology. SIMD is a computing paradigm in which a single instruction processes multiple data. Many modern processors have SIMD instructions that, for example, perform several additions or multiplications at once. For computationally intensive operations among supported functions, SIMD intrinsics can significantly improve the performance of the generated code on Intel platforms.

MATLAB Functions That Support SIMD Code

When certain conditions are met, you can generate SIMD code by using Intel SE or Intel AVX technology. The following table lists MATLAB functions that support SIMD code generation. The table also details the conditions under which the support is available.

MATLAB FunctionConditions
plus
  • For Intel AVX and Intel SSE, the input signal has a data type of single, double, int8, int16, int32 or int64.

  • For Intel AVX-512, the input signal has a data type of single or double.

minus
  • For Intel AVX and Intel SSE, the input signal has a data type of single, double, int8, int16, int32 or int64.

  • For Intel AVX-512, the input signal has a data type of single or double.

times
  • For Intel AVX and Intel SSE, the input signal has a data type of single, double, int16, or int32.

  • For Intel AVX-512, the input signal has a data type of single or double.

rdivideThe input signal has a data type of single or double.
sqrtThe input signal has a data type of single or double.
ceil
  • For Intel SSE and Intel AVX, the input signal has a data type of single or double.

  • Intel AVX-512 is not supported.

floor
  • For Intel SSE and Intel AVX, the input signal has a data type of single or double.

  • Intel AVX-512 is not supported.

max

The input signal has a data type of single or double.

min

The input signal has a data type of single or double.

If you have a DSP System Toolbox™, you can generate SIMD code from certain MATLAB System objects. For more information, see System objects in DSP System Toolbox that Support SIMD Code Generation (DSP System Toolbox).

Generate SIMD Code Versus Plain C Code

Consider the MATLAB function dynamic. This function consists of addition and multiplication operations between the variable-size arrays A and B. These arrays have a data type of single and an upper bound of 100 x 100.

function C = dynamic(A, B)
   assert(all(size(A) <= [100 100]));
   assert(all(size(B) <= [100 100]));
   assert(isa(A, 'single'));
   assert(isa(B, 'single'));

   C = zeros(size(A), 'like', A);
   for i = 1:numel(A)
       C(i) = (A(i) .* B(i)) + (A(i) .* B(i));
   end
end

To generate plain C code at the command line:

  1. For C library code generation, create a coder.config object

    cfg = coder.config('lib');
  2. To generate a static library in the default location, codegen\lib\dynamic, use the codegen function t.

    codegen('-config', cfg, 'dynamic');

  3. In the list of generated files, click dynamic.c. In the plain (non-SIMD) C code, each loop iteration produces one result.

    void dynamic(const float A_data[], const int A_size[2], const float B_data[],
                 const int B_size[2], float C_data[], int C_size[2])
    {
      float C_data_tmp;
      int i;
      int loop_ub;
      (void)B_size;
      C_size[0] = (signed char)A_size[0];
      C_size[1] = (signed char)A_size[1];
      loop_ub = (signed char)A_size[0] * (signed char)A_size[1];
      if (0 <= loop_ub - 1) {
        memset(&C_data[0], 0, loop_ub * sizeof(float));
      }
      loop_ub = A_size[0] * A_size[1];
      for (i = 0; i < loop_ub; i++) {
        C_data_tmp = A_data[i] * B_data[i];
        C_data[i] = C_data_tmp + C_data_tmp;
      }
    }

To generate SIMD C code at the command line:

  1. For C library code generation, use the coder.config function to create a coder.CodeConfig object.

    cfg = coder.config('lib');
  2. Set the coder.HardwareImplementation object TargetHWDeviceType property to 'Intel->x86-64 (Linux 64)' or 'Intel->x86-64 (Windows64)'.

    cfg.HardwareImplementation.TargetHWDeviceType = 'Intel->x86-64 (Windows64)';

  3. Set the coder.HardwareImplementation object ProdHWDeviceType property to 'Intel->x86-64 (Linux 64)' or 'Intel->x86-64 (Windows64)'

    cfg.HardwareImplementation.TargetHWDeviceType = 'Intel->x86-64 (Windows64)';

    If you are using the MATLAB Coder app to generate code:

    • Set the Hardware Device parameter to None-Select device below.

    • Set the Device vendor parameter to Intel or AMD.

    • Set the Device type to Intel->x86-64 (Linux 64) or Intel->x86-64 (Windows64).

  4. Set the CodeReplacementLibrary property to an Intel AVX or Intel SSE library. This example uses Intel SSE for Windows.

    cfg.CodeReplacementLibrary = 'Intel SSE (Windows)';
    

    The library that you choose depends on which instruction set extension your processor supports.

    For more information, see https://www.intel.com/content/www/us/en/support/articles/000005779/processors.html. This table lists which Intel intrinsic instructions sets each code replacement library contains.

    Code Replacement LibraryIntel Intrinsic Instruction Set
    Intel SSESSE, SSE2, SSE4.1
    Intel AVXSSE, SSE2, SSE4.1, AVX, AVX2
    Intel AVX-512SSE, SSE2, SSE4.1, AVX, AVX2, AVX-512

    If you are using the MATLAB Coder app to generate code, on the Custom Code tab, set the Code replacement library parameter to an Intel SSE or Intel AVX library.

  5. Use the codegen function to generate a static library in the default location, codegen\lib\dynamic.

    codegen('-config', cfg, 'dyanamic');
    
  6. In the list of generated files, click dynamic.c.

    void dynamic(const float A_data[], const int A_size[2], const float B_data[],
                 const int B_size[2], float C_data[], int C_size[2])
    {
      __m128 r;
      float C_data_tmp;
      int i;
      int loop_ub;
      int scalarLB;
      int vectorUB;
      (void)B_size;
      C_size[0] = (signed char)A_size[0];
      C_size[1] = (signed char)A_size[1];
      loop_ub = (signed char)A_size[0] * (signed char)A_size[1];
      if (0 <= loop_ub - 1) {
        memset(&C_data[0], 0, loop_ub * sizeof(float));
      }
      loop_ub = A_size[0] * A_size[1];
      scalarLB = (loop_ub / 4) << 2;
      vectorUB = scalarLB - 4;
      for (i = 0; i <= vectorUB; i += 4) {
        r = _mm_mul_ps(_mm_loadu_ps(&A_data[i]), _mm_loadu_ps(&B_data[i]));
        _mm_storeu_ps(&C_data[i], _mm_add_ps(r, r));
      }
      for (i = scalarLB; i < loop_ub; i++) {
        C_data_tmp = A_data[i] * B_data[i];
        C_data[i] = C_data_tmp + C_data_tmp;
      }
    }

    The SIMD instructions are the intrinsic functions that start with the identifier _mm. These functions process multiple data in a single iteration of the loop because the loop increments by four for single data types. For double data types, the loop increments by two. For MATLAB code that processes more data and is more computationally intensive, than the code in this example, the presence of SIMD instructions can significantly speed up the code execution time.

    The second for loop is in the generated code because the for loop that contains SIMD code must be divisible by four for single data types. The second loop processes the remainder of the data.

For a list of a Intel intrinsic functions for supported MATLAB functions, see https://software.intel.com/sites/landingpage/IntrinsicsGuide/.

Limitations

The generated code is does not contain SIMD code when the MATLAB code meets these conditions:

  • Scalar operations outside a loop. For example, if a,b, and c are scalars, the generated code does not contain SIMD code for an operation such as c=a+b.

  • Indirectly indexed arrays or matrices. For example, if A,B,C, and D are vectors, the generated code does not contain SIMD code for an operation such as D(A)=C(A)+B(A).

  • Parallel for-Loops (parfor). The parfor loop does not contain SIMD code, but loops within the body of the parfor loop might contain SIMD code.

Related Topics