February 13, 09:56

*

(2) is valid for models with complex forward pass and models with large embedding layers