Meaning of parameters of a trained resnet model

8 views (last 30 days)
Hi,
I am currently using the deep learning toolbox in matlab to train a resnet18 model. Before that, I have had some experience of using pytorch to train a same resnet model. Interesingly, the matlab's resnet18 model gave way better performance than pytorch's resnet18 model (99% vs ~70% classification accuracy), with the same training and testing datasets, the same 'Adam' optimizer and the same learning rate. Out of curiosity, I tried to import the trained resnet18 model in matlab into pytorch for more intuitional comparison using the onnx format. While the model has been imported, some extra parameters (as pictured below) of the model confused me. My questions are:
  1. What are 'preprocessing_Mul_B' and 'preprocessing_Add_B'? Are they the learned parameters used for preprocessing the trainning data?
  2. What are 'bn_conv1_scale' and 'bn_conv1_B'? Are they only used for batch normalization at eval phase, compared to 'bn_conv1_mean' and 'bn_conv1_var'?
  3. What are the possible reasons causing the huge performance difference in matlab and pytorch? I first suspected it was caused by different data normalization, but the difference persisted even the same data normalization was used in pytorch.
Any help will be much appreciated! Thank you.

Answers (1)

Sarah Mohamed
Sarah Mohamed on 9 Aug 2021
Hello Xufeng,
  1. Unlike in Pytorch, the ResNet-18 model includes input normalization in MATLAB. preprocessing_Mul_B and preprocessing_Add_B are indeed parameters used to preprocess the input data. That involves transforming the input into the range [0,1] and normalizing it using per-channel mean values of [0.485, 0.456, 0.406] and per-channel std values of [0.229, 0.224, 0.406]. To accomplish this, preprocessing_Mul_B is used to multiply the input by 1./(255*std), and preprocessing_Add_B is used to add -mean./std.
  2. bn_conv1_scale and bn_conv1_B are the learnable channel scaling factor and offset values, respectively, used by the batch normalization operation. These are distinct from the mean and variance values. For a detailed look at how they are used to calculate its output, you can find a description of the formula under 'Algorithms' for the batchNormalizationLayer here: https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.batchnormalizationlayer.html#d123e15720. bn_conv1_B and bn_conv1_scale correspond to offset β and scale factor γ.
  3. A difference in the input preprocessing would also be my first suspicion for the difference in results. As I mentioned earlier, Pytorch's implementation of this model doesn't appear to include any preprocessing - neither transforming the input to the range [0, 1] nor normalizing it. Could you share the steps you're taking to preprocess the input in Pytorch? It might be easier to comment on the differences if we can see your code.
Take care,
Sarah

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!