How much GPU do I need?

24 views (last 30 days)
William on 1 Sep 2023
Edited: William on 19 Oct 2023
I am trying to train an FCDD anomaly detector using inception V3 as the backbone network.
When I change the image size above 540 960 3, I get a "GPU ran out of memory" error.
How can I know how much GPU I need?
In deep learning AI training, what characteristics of the training process effect how "much" gpu is needed?
Is that the image size and the minibatch size?
For a given image size [x y z] jpeg, and minibatch size z, can I calculate, with descriptive analysis, the needed gpu to train a network as described in my first sentence of this question post?
Thank you,
Matlab deep learning enthusiast.

Answers (2)

Mrutyunjaya Hiremath
Mrutyunjaya Hiremath on 1 Sep 2023
Basic Formula to Estimate GPU Memory Requirement
Memory Required=Model Size+Batch Size×(Forward Pass Memory+Backward Pass Memory)
  1. Model Size: Memory to store the model weights. If the model has N parameters and each parameter is of size S bytes (usually 4 bytes for float32), then the model size is N×S.
  2. Forward Pass Memory: Memory to store the intermediate activations during a forward pass. This depends on the model architecture and input size.
  3. Backward Pass Memory: Memory to store gradients during backpropagation. This is roughly equal to the forward pass memory.
  4. Batch Size: Number of samples processed in parallel.
Let's take an example with hypothetical values for an Inception V3 model to illustrate:
  • Model Parameters (Inception V3): Approx 21.8M
  • Data Type: float32 (4 bytes)
  • Input Size: 540x960x3
  • Batch Size: 32
For simplicity, let's assume that the forward and backward pass each require memory roughly equal to the input size times the number of feature maps at each layer.
Model Size = 21.8M parameters × 4 bytes/parameter = 87.2 MB
Forward Pass Memory = Batch Size × Input Size × Feature Maps × 4 bytes
Backward Pass Memory ≈ Forward Pass Memory
Assuming that the feature maps are roughly the same size as the input image (another gross simplification), and that there are about 1000 feature maps (across all layers):
Forward Pass Memory = 32 × 540 × 960 × 3 × 1000 × 4 bytes ≈ 197.6 GB
Total Memory = Model Size + Forward Pass Memory + Backward Pass Memory = 87.2 MB + 2 × 197.6 GB ≈ 395.2 GB
This is a very crude estimate and the actual memory requirement will likely be different due to various optimizations that deep learning frameworks employ.
So, to directly answer your question: you would likely need a GPU with a lot more than 400GB of memory with the current setup, which is currently infeasible. You would have to make adjustments to your model, data, or training regime to fit it into a GPU that you can realistically acquire.
Solutions for Memory Errors
  1. Reduce Batch Size: The easiest way to reduce memory usage.
  2. Use Gradient Accumulation: Perform a backward pass after accumulating gradients over multiple smaller batches.
  3. Use Mixed Precision Training: Utilizes both float16 and float32 to make training more memory-efficient.
  4. Use a Simpler Model: Smaller architectures require less memory.
  5. Distributed Training: Split the model and data across multiple GPUs.
  6. Check for Memory Leaks: Make sure that you're not unintentionally holding onto tensors that you no longer need.

Alex Taylor
Alex Taylor on 1 Sep 2023
To the above answer I would add:
1) I'm assuming that you are using pretrainedEncoderNetwork or another method to "cut" the inceptionV3 at a given point. So, the parameter count in the network needs to be based on the set of Learnables in the backbone, not the full inceptionV3 network.
2) In practice, rather than this kind of analytic memory analysis, which can be complicated and requires some implementation details about how ops are implemented for forward/backward and how memory is cached in forward/backward, it's often just easier to start with some small input size spatially, say 224x224x3 for example, and a low batch size. Then increase either batch size or spatial dimensions one at a time to see how GPU RAM scales with changing.
3) Using a smaller backbone like a mobilenet type architecture, or using lower batch size or lower spatial dims will be ways of reducing peak memory consumption during training.
4) FCDD in particular often works well by dividing your training data into patches. Since it is a fully convolutional architecture, you can train on smaller patches at a given scale and then do full sized inference at the same scale as a way of working within memory limitations at training time but not being tied to the same spatial dimension size during inference. To do this you will need to obtain a repsentative set of good patches/tiles and a small set of bad patches/tiles from your full sized images.
Alex Taylor
Alex Taylor on 17 Oct 2023
I generally use the CUDA tool nvidia-smi:
To monitor GPU memory use. You can have it run in a loop with the -l option.
William on 19 Oct 2023
Edited: William on 19 Oct 2023
Hey Alex,
I attempted to do what you suggested in number 2).
So I decided to just kind of watch the performance tab in windows task manager.
I watched where it said "Dedicated GPU memory".
Is that Ok?
I tested 2 image sizes. I did these 2:
  • [270 480 3]
  • [540 960 3]
minibatch size was 7 for both.
Data and everything else all training parameters and everything was the same for both.
Only thing different was the resizing of the images.
Before each test, I did a gpuDevice(1) in the command window, to clear the GPU.
Also, I went ahead and closed matlab, turned off, and restarted the computer.
I did this to make sure I had a perfect fresh start each time.
For [270 480 3], starting GPU before running the training section of the code was 1.3 gig
During training, for size [270 480 3], the peak dedicated GPU appeared to be 4.5 gig
For [540 960 3], starting GPU before running the training section of the code was 1.8 gig
During training, for size [540 960 3], the peak dedicated GPU appeared to be 7.8 gig
I have a total available dedicated GPU of 8 gigs, (So that probably explains why I get GPU out of memory errors at training when I try to go to images above the size of [540 960 3] )
[540 960 3] is an image roughly twice the size of [270 480 3]
it sort of looks like it used twice the gpu to, which might be a conincidence.
Anyway, the real image size I'd like to be able to train on is [2160 3840 3]
All images are jpg.
[2160 3840 3] is exactly 4 times times bigger image than [540 960 3]
Would this mean I need 4 times the GPU?
At least 32 gigs of dedicated GPU?
thank you,

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!