February 11, 06:22

Old news ... but Attention works

Funny enough, but in the past my models :

- Either did not need attention;

- Attention was implemented by @thinline72 ;

- The domain was so complicated (NMT) so that I had to resort to boilerplate with key-value attention;

It was the first time I / we tried manually building a model with plain self attention from scratch.

An you know - it really adds 5-10% to all of the tracked metrics.

Best plain attention layer in PyTorch - simple, well documented ... and it works in real life applications:

gist.github.com/cbaziotis/94e53bdd6e4852756e0395560ff38aa4

#nlp

#deep_learning

SelfAttention implementation in PyTorch

SelfAttention implementation in PyTorch. GitHub Gist: instantly share code, notes, and snippets.