public enum GradientNormalization extends java.lang.Enum<GradientNormalization>
None = no gradient normalization (default)
RenormalizeL2PerLayer = rescale gradients by dividing by the L2 norm of all gradients for the layer.
RenormalizeL2PerParamType = rescale gradients by dividing by the L2 norm of the gradients, separately for
each type of parameter within the layer.
This differs from RenormalizeL2PerLayer in that here, each parameter type (weight, bias etc) is normalized separately.
For example, in a MLP/FeedForward network (where G is the gradient vector), the output is as follows:
ClipElementWiseAbsoluteValue = clip the gradients on a per-element basis.
For each gradient g, set g <- sign(g)*max(maxAllowedValue,|g|).
i.e., if a parameter gradient has absolute value greater than the threshold, truncate it.
For example, if threshold = 5, then values in range -5<g<5 are unmodified; values <-5 are set
to -5; values >5 are set to 5.
This was proposed by Mikolov (2012), Statistical Language Models Based on Neural Networks (thesis),
http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf
in the context of learning recurrent neural networks.
Threshold for clipping can be set in Layer configuration, using gradientNormalizationThreshold(double threshold)
ClipL2PerLayer = conditional renormalization. Somewhat similar to RenormalizeL2PerLayer, this strategy scales the gradients if and only if the L2 norm of the gradients (for entire layer) exceeds a specified threshold. Specifically, if G is gradient vector for the layer, then:
ClipL2PerParamType = conditional renormalization. Very similar to ClipL2PerLayer, however instead of clipping
per layer, do clipping on each parameter type separately.
For example in a recurrent neural network, input weight gradients, recurrent weight gradients and bias gradient are all
clipped separately. Thus if one set of gradients are very large, these may be clipped while leaving the other gradients
unmodified.
Threshold for clipping can be set in Layer configuration, using gradientNormalizationThreshold(double threshold)
Enum Constant and Description |
---|
ClipElementWiseAbsoluteValue |
ClipL2PerLayer |
ClipL2PerParamType |
None |
RenormalizeL2PerLayer |
RenormalizeL2PerParamType |
Modifier and Type | Method and Description |
---|---|
static GradientNormalization |
valueOf(java.lang.String name)
Returns the enum constant of this type with the specified name.
|
static GradientNormalization[] |
values()
Returns an array containing the constants of this enum type, in
the order they are declared.
|
public static final GradientNormalization None
public static final GradientNormalization RenormalizeL2PerLayer
public static final GradientNormalization RenormalizeL2PerParamType
public static final GradientNormalization ClipElementWiseAbsoluteValue
public static final GradientNormalization ClipL2PerLayer
public static final GradientNormalization ClipL2PerParamType
public static GradientNormalization[] values()
for (GradientNormalization c : GradientNormalization.values()) System.out.println(c);
public static GradientNormalization valueOf(java.lang.String name)
name
- the name of the enum constant to be returned.java.lang.IllegalArgumentException
- if this enum type has no constant with the specified namejava.lang.NullPointerException
- if the argument is null