I understand that you want to train a small neural network from a teacher neural network using knowledge distillation for a regression problem.
The idea of knowledge distillation is to reduce the size of a neural network while maintaining its accuracy. Similar to the case of a classification problem, the student neural network would be trained using a distillation loss which would be a combination of loss between student’s and teacher’s predictions and the loss between student’s predictions and actual outputs.
The key difference between the classification and regression task is a loss function. In the case of regression mean squared error (MSE) values could be used. Also, in the case of a regression task the output is a continuous value instead of a discrete class and hence there would be a change in the output layer.
Refer to a simple code snippet below:
XTrain = linspace(-10, 10, numObservations)';
YTrain = sin(XTrain) + 0.1 * randn(numObservations, 1);
teacherOptions = trainingOptions('adam', ...
'Plots', 'training-progress', ...
teacherNet = trainNetwork(XTrain, YTrain, teacherLayers, teacherOptions);
studentOptions = trainingOptions('adam', ...
'Plots', 'training-progress', ...
studentNet = trainNetwork(XTrain, YTrain, studentLayers, studentOptions);
teacherPred = predict(teacherNet, XTrain);
combinedTargets = alpha * teacherPred + (1 - alpha) * YTrain;
studentNet = trainNetwork(XTrain, combinedTargets, studentLayers, studentOptions);
Following are the results produced for teacher network, student network and for the knowledge distillation from teacher to student.
The above code snippet is a quite straightforward way to illustrate how MSE could be used for a regression task in knowledge distillation. You could use knowledge distillation for regression in the similar way as used for classification.
I hope this helps!!