Everytime I am training my custom feed-forward net with 2 inputs and one output( timeseries) with the train(net,....) function:
after ~10 training epochs the value of the gradient reaches the prestet value and the training stops.
Changing the networks architecture is not an option in my case.
Is there a way to implement "gradient clipping" with a feed-forward net?
Or is there any other workaround for the "exploding gradient"?