How to interpret tocBytes results

2 views (last 30 days)
I am currently running on R2019b, but when I attempt to start parpool using the 'threads' option, it tells me that 'threads' is not a valid option for parpool (the only option is 'local'). I noticed that 'threadPool' was introduced in 2020a, so I am guessing that perhaps this means that I need a later MATLAB release to use the 'threads' option with parpool.
According to the descision chart located Here - using threads-based local pool is advantageous if you are running on a single machine, and there is a large amount of data being transferred to each worker.
So I have 2 questions;
  1. What is preventing me from being able to use the 'threads' option (Do I need a MATLAB release later than R2019b?)
  2. How do I interpret the results from tocBytes to determine if 'threads' will be a benefit to me?
I am running on a system with Dual Xeon Gold 6148 CPUs @ 2.4 Ghz, 256 Gb, 20 cores (each) Total of 40 cores.
Using 36 workers, tocBytes shows the following data transfer to the workers:
Min: 35 Mb, Max: 506 Mb, Mean: 153 Mb, Median: 95.4 Mb
Total (all 36 workers) is 5.5 Gb
So, are these numbers considered "large", and would I expect to see a benefit in using threads-based processing.

Accepted Answer

Walter Roberson
Walter Roberson on 21 Jul 2023
You need R2020a or later to use parpool("threads")
However if you are using R2021b or later, it is recommended that you use backgroundPool
Unfortunately, ticBytes() and tocBytes() do not work in parpool("threads") or backgroundPool so it is not possible to use those tools to compare the data transfer.
My understanding is that for ordinary numeric classes, that shared pointers are used for the different threads, but that if copy-on-write is needed that the newly allocated memory is on a per-thread memory pool (so that it can be easily released when the parfeval() finishes) . However, I have not yet been able to come up with a consistent internal description of how threads work that would lead to the same limitations as threads have -- the architectures I have come up with mentally would have fewer limitations than thread pools have in practice. Either that or the architectures I come up with might block all handle objects... I haven't figured out yet what Mathworks is doing that would allow some handle objects to work with thread-shared memory but would still require the same limitations that are seen in practice.
  6 Comments
Walter Roberson
Walter Roberson on 21 Jul 2023
Edited: Walter Roberson on 21 Jul 2023
Suppose that you were able to eliminate 100% of the 5.5 gigabytes. That would reduce your computation time by
format long g
bytes_to_transfer = 5.5 * 10^9;
max_bandwidth_bytes_per_second = 119.21 * 2^30
max_bandwidth_bytes_per_second =
128000762839.04
seconds_to_transfer = bytes_to_transfer / max_bandwidth_bytes_per_second
seconds_to_transfer =
0.0429684939215262
Which would be roughly 1/23 of a second. Which is less than the measurment error of "44 minutes"
Jim Riggs
Jim Riggs on 22 Jul 2023
Thank you for the analysis. This is very helpful.

Sign in to comment.

More Answers (0)

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!