Given two 3d matricies, A and B with
perform matrix multiplications on each slice obtained by fixing the last index, yielding a matrix C with
To clarify, we would have
C(:, :, 1) = A(:, :, 1)*B(:, :, 1), ...
I need to do this with gpuArrays in the most efficient manner possible.