Meaning of parameters of a trained resnet model

Question

Xufeng Lin on 9 Aug 2021

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/894847-meaning-of-parameters-of-a-trained-resnet-model

Answered: Sarah Mohamed on 9 Aug 2021

Hi,

I am currently using the deep learning toolbox in matlab to train a resnet18 model. Before that, I have had some experience of using pytorch to train a same resnet model. Interesingly, the matlab's resnet18 model gave way better performance than pytorch's resnet18 model (99% vs ~70% classification accuracy), with the same training and testing datasets, the same 'Adam' optimizer and the same learning rate. Out of curiosity, I tried to import the trained resnet18 model in matlab into pytorch for more intuitional comparison using the onnx format. While the model has been imported, some extra parameters (as pictured below) of the model confused me. My questions are:

What are 'preprocessing_Mul_B' and 'preprocessing_Add_B'? Are they the learned parameters used for preprocessing the trainning data?
What are 'bn_conv1_scale' and 'bn_conv1_B'? Are they only used for batch normalization at eval phase, compared to 'bn_conv1_mean' and 'bn_conv1_var'?
What are the possible reasons causing the huge performance difference in matlab and pytorch? I first suspected it was caused by different data normalization, but the difference persisted even the same data normalization was used in pytorch.

Any help will be much appreciated! Thank you.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Sarah Mohamed on 9 Aug 2021

1
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/894847-meaning-of-parameters-of-a-trained-resnet-model#answer_763927

Hello Xufeng,

Unlike in Pytorch, the ResNet-18 model includes input normalization in MATLAB. preprocessing_Mul_B and preprocessing_Add_B are indeed parameters used to preprocess the input data. That involves transforming the input into the range [0,1] and normalizing it using per-channel mean values of [0.485, 0.456, 0.406] and per-channel std values of [0.229, 0.224, 0.406]. To accomplish this, preprocessing_Mul_B is used to multiply the input by 1./(255*std), and preprocessing_Add_B is used to add -mean./std.
bn_conv1_scale and bn_conv1_B are the learnable channel scaling factor and offset values, respectively, used by the batch normalization operation. These are distinct from the mean and variance values. For a detailed look at how they are used to calculate its output, you can find a description of the formula under 'Algorithms' for the batchNormalizationLayer here: https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.batchnormalizationlayer.html#d123e15720. bn_conv1_B and bn_conv1_scale correspond to offset β and scale factor γ.
A difference in the input preprocessing would also be my first suspicion for the difference in results. As I mentioned earlier, Pytorch's implementation of this model doesn't appear to include any preprocessing - neither transforming the input to the range [0, 1] nor normalizing it. Could you share the steps you're taking to preprocess the input in Pytorch? It might be easier to comment on the differences if we can see your code.

Take care,

Sarah

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Meaning of parameters of a trained resnet model

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Meaning of parameters of a trained resnet model

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments