hyperthreading question

7 views (last 30 days)
Mate 2u
Mate 2u on 1 May 2012
Hi there, I have matlab running on a Quad core I7 processor which has 8 threads. I did a loop of a function test changing the number of labs. These are my results in seconds:
matlabpool 12 21.61s
matlabpool 10 21.96s
matlabpool 8 22.27s
matlabpool 6 23.29s
matlabpool 4 25.54s
I tested it on another loop with another function and also found that using 12 labs is the fastest? How comes this is happening if I only have 8 threads available and 4 cores?
I look forward to a reply.
  3 Comments
Ken Atwell
Ken Atwell on 2 May 2012
The timing differences are pretty subtle, all with 20% of each other. I'd be tempted to call this a "tie". :)
Richard Brown
Richard Brown on 2 May 2012
That was my initial thought too, but the numbers are uniformly decreasing as the number of labs went up ... it would be interesting to know if that always happens

Sign in to comment.

Accepted Answer

Edric Ellis
Edric Ellis on 2 May 2012
There are several reasons why different numbers of workers can behave differently. In some situations, even if each PARFOR loop iterate takes quite a long time, it can be quicker to run fewer workers than you have cores available; other times, it can be quicker to run more workers than you have cores. This is because of the various resource contentions that your code encounters.
If your algorithm is memory bound - i.e. the main contention is for access to RAM (for example, adding together two large matrices - the amount of computation is trivial compared to the time it takes to get the data into the CPU), then you often find that fewer workers perform better.
If your algorithm is compute bound - i.e. not much memory access compared to the compuational complexity, then more workers (up to the number of physical cores) works better.
It's possible in some cases that if your algorithm is bounded by some sort of latency elsewhere, that running more workers than you have cores works best.

More Answers (1)

Geoff
Geoff on 2 May 2012
I'd like to see the results if you expand your operation to something that takes about 10 minutes. And do it with the utter minimum of background processes running.
You're talking about a few hundred milliseconds, which can easily be eaten up by, say, a piece of software doing some routine background work.
Also, in all fairness, you MUST ensure that the number of iterations in your parfor loop is a multiple of 4, 6, 8, 10 and 12, or some number sufficiently high as to cancel out the effect of some workers finishing the task early, while others have to perform one extra loop (I'm assuming your test function is a constant-time operation).
If would be interesting to see your test code, if you are happy to post it.
At this stage, I'm not open to accepting that 10 workers is faster than 8 on a machine with 8 logical cores. But that could be due to my own ignorance. =)
  1 Comment
Geoff
Geoff on 2 May 2012
Just an afterthought about my comment on the number of parfor loops. I would in fact prefer starting with prod([4,6,8,10,12]) iterations, and multiply that by about a million. The goal is that you share out a decent amount of work evenly between all your workers, then set them loose.

Sign in to comment.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!