The GradientNormalization.java Java example source code
package org.deeplearning4j.nn.conf;
/**Gradient normalization strategies. These are applied on raw gradients, before the gradients are passed to the
* updater (SGD, RMSProp, Momentum, etc)<br>
* <p>None = no gradient normalization (default)
*
* <p>RenormalizeL2PerLayer = rescale gradients by dividing by the L2 norm of all gradients for the layer.
*
* <p>RenormalizeL2PerParamType = rescale gradients by dividing by the L2 norm of the gradients, separately for
* each type of parameter within the layer.<br>
* This differs from RenormalizeL2PerLayer in that here, each parameter type (weight, bias etc) is normalized separately.<br>
* For example, in a MLP/FeedForward network (where G is the gradient vector), the output is as follows:
* <ul style="list-style-type:none">
* <li>GOut_weight = G_weight / l2(G_weight)
* <li>GOut_bias = G_bias / l2(G_bias)
* </ul>
* </p>
*
* <p>ClipElementWiseAbsoluteValue = clip the gradients on a per-element basis.
* For each gradient g, set g <- sign(g)*max(maxAllowedValue,|g|).
(thesis),
* <a href="http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf">http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf
* in the context of learning recurrent neural networks.<br>
* Threshold for clipping can be set in Layer configuration, using gradientNormalizationThreshold(double threshold)
* </p>
*
* <p>ClipL2PerLayer = conditional renormalization. Somewhat similar to RenormalizeL2PerLayer, this strategy
* scales the gradients <i>if and only if the L2 norm of the gradients (for entire layer) exceeds a specified
* threshold. Specifically, if G is gradient vector for the layer, then:
* <ul style="list-style-type:none">
* <li>GOut = G if l2Norm(G) < threshold (i.e., no change)
* <li>GOut = threshold * G / l2Norm(G) otherwise
* </ul>
* Thus, the l2 norm of the scaled gradients will not exceed the specified threshold, though may be smaller than it<br>
* See: Pascanu, Mikolov, Bengio (2012), <i>On the difficulty of training Recurrent Neural Networks,
* <a href="http://arxiv.org/abs/1211.5063">http://arxiv.org/abs/1211.5063
* Threshold for clipping can be set in Layer configuration, using gradientNormalizationThreshold(double threshold)
* </p>
*
* <p>ClipL2PerParamType = conditional renormalization. Very similar to ClipL2PerLayer, however instead of clipping
* per layer, do clipping on each parameter type separately.<br>
* For example in a recurrent neural network, input weight gradients, recurrent weight gradients and bias gradient are all
* clipped separately. Thus if one set of gradients are very large, these may be clipped while leaving the other gradients
* unmodified.<br>
* Threshold for clipping can be set in Layer configuration, using gradientNormalizationThreshold(double threshold)</p>
*
* @author Alex Black
*/
public enum GradientNormalization {
None,
RenormalizeL2PerLayer,
RenormalizeL2PerParamType,
ClipElementWiseAbsoluteValue,
ClipL2PerLayer,
ClipL2PerParamType
}
Other Java examples (source code examples)
Here is a short list of links related to this Java GradientNormalization.java source code file: