A polynomial learning rate schedule drops the learning rate using a power law.
The software uses this formula to calculate the learning rate:
where:
α0 is the base learning rate, specified by the
InitialLearnRate
option of the
trainingOptions
function.
γ0 is the initial scaling factor,
specified by the InitialFactor
argument.
γN is the final scaling factor,
specified by the FinalFactor
argument.
λ is the power, specified by the Power
argument.
N is the number of steps, specified by the NumSteps
argument.
If FrequencyUnit
is
"iteration"
, then the variable k denotes the
schedule iteration number. The value of k can be different from the
training iteration number. For example, if you train using
{"warmup",schedule}
, where schedule
is a
polynomialLearnRate
object with FrequencyUnit
set to
"iteration"
, then at training iteration 15, the
polynomialLearnRate
object uses k with a value of 10 because
the warm-up learning rate schedule runs for 5 training iterations.
If FrequencyUnit
is "epoch"
, then the variable
k denotes the schedule epoch number. In this case, the schedule uses
the same learning rate for each iteration of the epoch. The value of k
can be different from the training epoch number. For example, if you train using
{warmupLearnRate(FrequencyUnit="epoch"),schedule}
, where
schedule
is a polynomialLearnRate
object with
FrequencyUnit
set to "epoch"
, then at training
epoch 15, the polynomialLearnRate
object uses k with a value of
10 because the warm-up learning rate schedule processes the first 5 epochs.
This plot shows an overview of the polynomial learning rate schedule with the
NumSteps
argument set to the length of the training process.