Now they stack ... normalization!

Tough to choose between BN / LN / IN?

Now a stacked version with attention exists!

Also, their 1D implementation does not work, but you can hack their 2D (actually BxCxHxW) layer to work with 1D (actually BxCxW) data =)



