How to measure GPU memory bandwidth ?

8 views (last 30 days)
I have a TeslaC1060 with 4Gb of memory. I am running MatlabR2012b and I am using the following code to measure the memory bandwidth between host and device.
gpu = gpuDevice()
N=8192; data = rand(N,N); %
for k=1:100
tic;
gdata = gpuArray(data); wait(gpu);
CPU2GPU(k) = N^2*8/1024^3/toc;
tic;
data2 = gather(gdata); wait(gpu);
GPU2CPU(k) = N^2*8/1024^3/toc;
end
figure;
plot(1:100,CPU2GPU,'r.',1:100,GPU2CPU,'b.');
legend('CPU->GPU','GPU->CPU');
I found less than 1.5 Gb/s from GPU to CPU and less than 3.0 Gb/s from CPU to GPU (averaging 100 values except the very first ones). 1) Why the values measured are so far from the expected 8 Gb/s? It turns out that the 100 values vary from one run to another by a factor almost 2. 2) Why the behavior of this code is not so reproductible?
Thanks for your help.

Accepted Answer

Ben Tordoff
Ben Tordoff on 15 Apr 2013
Hi Anterrieu,
you might like to have a look at the following article:
in those results, the achieved transfer bandwidth tops out at about 5.7GB/sec (send) and 4.0GB/sec (gather). Whilst I can't give you a definitive answer as to why your measured transfer rates are so low and unreliable, here are a couple of points to consider:
  1. the second "wait(gpu)" inside your tight loop is not needed and will be affecting results. Memory transfers from device to host (i.e. "gather") are always synchronized.
  2. You are measuring the speed of transferring data to/from the GPU (i.e. the speed of the PCI bus). This is not the same as the GPU memory bandwidth (as suggested by the question title), which is much, much higher (>90GB/sec for your GPU and even higher for a recent GPU).
  3. it is nearly impossible to accurately measure the transfer bandwidth from within MATLAB. What you are actually timing here is the time taken to allocate some space (on the GPU in the first case, in host memory for the second), to perform the data-transfer and to assign a MATLAB variable. These extra steps take some (hopefully small) amount of time that will reduce the results.
  4. some of the variability may come from other processes using the PCI bus. Running your OS in a highly stripped-down mode with no network etc. might help.
If you try the code from the article and still see much lower results, let me know. Note, however, that you are not really measuring your GPU here, you are simply measuring how busy your PCI bus is and how well MATLAB can throw data at it. It's an important measure, but it's not usually the most important one, so long as you do plenty of calculations with your data once you've put it on the GPU. If you want to know more about your GPU's calculation performance, you might like to take GPUBench for a spin:
Ben
  1 Comment
Anterrieu
Anterrieu on 15 Apr 2013
Thanks Ben for this answer. Indeed, this piece of code is derived from the one from Loren and from the benchmark you are quoting. However, I consider that getting a number from only one run is not serious (this is my opinion as a research scientist) this is why I ran it more than 100,000 times in order to obtain a more accurate value. The value I got is accurate for GPUtoGPU because stdv is very small, but for GPUtoCPU and CPUtoGPU there is clearly 2 values which differ one from the other by 15%: this is not a small variation due to CPU activity during the test (by the way the PC was disconnected from the net and all the system activities were switched off) and the discrepancy is not periodic and there is no correlation between the two transfers. This is all I know up to now, but I am still deeping and I have already found some aspects of GPU implementation of MathWorks not documented...

Sign in to comment.

More Answers (0)

Categories

Find more on Parallel and Cloud in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!