Cores vs. speed tradeoff for a Matlab computer

I am buying a computer for Matlab modeling. The goal is to select the computer in my budget range that will complete a set of simulations as fast as possible. I believe I should purchase a computer with as many cores as I can afford, but the processor speed shouldn't be slow. Any suggestions on optimizing the tradeoff of [number of cores] vs. [processor speed]? Below are some example combinations. Which would finish 144 simulations the fastest?
  • 6 cores at 3.46GHz, single CPU
  • 8 cores at 2.93GHz, dual CPU
  • 12 cores at 2.40GHz, dual CPU
[For full disclosure, we haven't been able to get the Parallel Computing toolbox to work with our model code, so we just start a separate instance of Matlab for each core.]

 Accepted Answer

You don't mention memory or storage. These are also critical to processing speed.
You should spec an adequate amount of RAM per core, depending on how much memory your simulations use. If you can keep things in RAM they are going to go much faster than if you need to wait on disk access. RAM is relatively inexpensive these days so you should be able to stuff quite a bit of it in for comparatively short money.
I'd also recommend using a SSD for storage. The performance versus a regular hard drive is significant, which will be important if your simulations can't fit in memory or are accessing data files from storage.
Given the information provided, there's no way of knowing which computer would "win" without making a pile of assumptions that may or may not be valid for what you are working on.
You also don't mention how you get to the number of cores. Is this a single CPU system or dual?

18 Comments

Sorry, I added that info above (1 CPU for 6 cores, 2 for 8 and 12)
I would probably go for the dual CPU setups, and look specifically at if there is an upgrade path that would let you increase the core counts at a later date if you need to -- this is dependent on the CPU socket, and ideally you could pick one that is at the beginning of it's life cycle or still has a few years in it.
Thanks, this is all very useful. Assuming it's 2x4 = 8 cores, what do you think is a minimum or sensible choice for RAM? 2.4 GHz? Sorry for so many follow up questions.
I would start at 16 GB RAM with 8 cores. That comes straight from the system requirements page (http://www.mathworks.com/support/sysreq/current_release/). But to some extent it's dependent on what you are doing. If you run your simulation and monitor the amount of RAM utilized during the run, you should get an idea of where you should be if you want to scale up.
I find it difficult to do anything interesting in 2 GB per session (what you would get with 16 GB distributed over 8 cores.)
That's why I suggested it as a bare starting point. With 16 GB running ~$125, 32 GB ~$300, and 64 GB ~$600, it's really worth it to avoid swapping and gain the performance increase of doing so. IMHO it's going to be a better bang for buck versus more cores. But that's a very humble and not fully informed opinion.
Walter, how many GB per session to do something interesting?
When only 2 Gb are available, the maximum array size is about 800 Mb and the total available memory is only about 1100 Mb. I am more comfortable with being able to handle several larger arrays -- e.g., at least 3 Gb.
I am working in development mode most of the time, and images eat through memory. The flip side is that we run larger datasets once past development stage, with covariance matrices and inv() all over the place. How often do problems get _smaller_ in production? ;-)
The good news for you is that you can instrument your present runs to find out how much memory they actually use.
How do I track memory usage during the most memory-intensive part of the model, which is a fminsearch call? Do I place "userview = memory" in the model code before/after the fminsearch call? Or should I check Windows Task Manager while one instance of the simulation is running and look at how much memory the Matlab process is using (for example, 249 MB for an infinite loop that I am allowing to run).
I'm partial to just using the task manager, or if I want to get fancy I can use something like
tasklist /fi "imagename eq matlab.exe"
on a polling interval to gather the data in a file.
If I want to get super fancy I can set up a performance monitor counter that will take data about the matlab.exe process. This can be very detailed, though -- and might be more information than you really need to know -- but it can tell you not only memory utilization, but also disk I/O, CPU utilization, network I/O and a host of other stuff.
The performance monitor counter is also fairly intensive when it's gathering this information -- so you can get into the problem of the instrumentation intruding on performance of the system if you do too much monitoring.
Please help me interpret Task Manager info so I can figure out how much memory to buy on my new computer:
While one instance of my simulation is running, I select View > CPU Processes > One Graph, All CPUs.
Sustained peaks in the CPU Usage History reach 80% (my best guess - no units/labels on the history plot)
If 'Total Physical Memory' is 3.4 GB, does this mean that I should buy at least 3.4*0.8 = 2.7 GB per core on my new computer?
Look at the process list under the "Process" tab. Under the View menu, select "Add Columns". Add in "Memory (Private Working Set)" and "Peak Working Set". Now sort the processes alphabetically, then find matlab.exe. Run your simulation. You should see these value start to rise, then fall off as the thing ends. The peak value should rise to show you the most memory consumed.
Caveats are galore here -- if you look into what these counters really mean, you'll find that they are not 100% accurate. But they should give you some idea of how much a simulation consumes over a run and guide you in specing out your system.
You can also add in things like the "Threads" counter to see how many threads get started, disk I/O, etc. Performance profiling can get very detailed and persnickety very quickly!
Jan's point is also a good one -- especially if you end up with a lot of things fighting for a shared resource, the overall system performance decreases. That's always the balancing act in these types of scenarios, and finding out how to properly size these resources is a non-trivial task. It's entirely possible that Jan's collection of Core2Duo hosts could out-perform any single system, but there are many additional headaches that pop up managing 20+ systems (that's part of my "day job" :) )
If Peak Working Set is the max memory that my simulation uses, its value (178560 K) is much less than the 80% value of CPU usage would suggest. How do I use this value to select how much memory my new computer should have in order to run this simulation? Or have I misunderstood? (By the way, I am using Process Explorer, http://technet.microsoft.com/en-us/sysinternals/bb896653, since XP didn't have those columns.)
"CPU Usage" has to do with how busy the CPU is kept, not to do with how much memory is being used. When a system is 80% used, 20% of the CPU time is being spent waiting for _something_. The details of what it is waiting for can be interesting. Do keep in mind, though, that if you managed to eliminate all of those waits by throwing hardware at the problem, you would only get 100/80 = 5/4 times as much performance.
Do not multiply CPU rate by memory -- they are very different items.
One stat that you should look at is Swap: if you have any swapping going on then your system is going to be slowed down.
I was making the implicit assumption that you were on Windows 7. My apologies. Windows 7/Vista improved a lot of this kind of instrumentation versus XP. Process Explorer is a good utility for getting around the lack of this built-in instrumentation on XP.
I will state an implicit assumption that I was making: This new system will be running a 64-bit operating system, correct? I'm further assuming that you will be staying on Windows (most likely Windows 7), as well. If you aren't going to a 64-bit OS, you are going to be limited the amount of RAM a 32-bit OS can address, and Jan's gang of 20 CoreDuo hosts looks better and better to improve your simulation time.
Yes, will buy 7 with 64-bit OS. I am getting stuck trying to figure the memory that my simulations use in order to spec the new machine, so perhaps I should just choose 3 GB per core, the max number of cores that my budget will allow, and leave processor speed as a secondary consideration.
Thanks to everyone who helped me with this! I really appreciate it.

Sign in to comment.

More Answers (1)

If you run 12 Matlab sessions on 12 cores and the memory gets low, all sessions compete for the swap space on the hard disk or SSD. Then even a single core processing the simulations sequentially can be faster. If the simulation runs fine on a single core with X GB RAM, install 12*X GB for 12 cores.
Some Matlab functions are parallelized internally, e.g. some linear algebra functions, SUM, MAX, etc. These function will run fastest on 12*2.40GHz cores - if the cores are not occupied by 12 Matlab sessions.
I've bought a used 2.3GHz Core2Duo for 68 Euro. I'm convinced, that 20 of them will beat all 3 high tech machines. Just a thought.

Categories

Find more on Parallel Computing Toolbox in Help Center and File Exchange

Products

Asked:

K E
on 1 Feb 2012

Edited:

on 26 Sep 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!