March 21, 11:15

Normalization techniques other than batch norm:


Weight normalization (used in TCN

- Decouples length of weight vectors from their direction;

- Does not introduce any dependencies between the examples in a minibatch;

- Can be applied successfully to recurrent models such as LSTMs;

- Tested only on small datasets (CIFAR + VAES + DQN);

Instance norm (used in [style transfer](

- Proposed for style transfer;

- Essentially is batch-norm for one image;

- The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch;

Layer norm (used in Transformers, [paper](

- Designed especially for sequntial networks;

- Computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case;

- The mean and standard-deviation are calculated separately over the last certain number dimensions;

- Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the affine option, Layer Normalization applies per-element scale and bias;