Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1797 members, 1726 posts since 2016

All this - lost like tears in rain.

Data science, ML, a bit of philosophy and math. No bs.

Our website
- spark-in.me
Our chat
- t.me/joinchat/Bv9tjkH9JHYvOr92hi5LxQ
DS courses review
- goo.gl/5VGU5A
- goo.gl/YzVUKf

February 11, 06:22

Old news ... but Attention works

Funny enough, but in the past my models :

- Either did not need attention;

- Attention was implemented by @thinline72 ;

- The domain was so complicated (NMT) so that I had to resort to boilerplate with key-value attention;

It was the first time I / we tried manually building a model with plain self attention from scratch.

An you know - it really adds 5-10% to all of the tracked metrics.

Best plain attention layer in PyTorch - simple, well documented ... and it works in real life applications:

gist.github.com/cbaziotis/94e53bdd6e4852756e0395560ff38aa4

#nlp

#deep_learning

SelfAttention implementation in PyTorch

SelfAttention implementation in PyTorch. GitHub Gist: instantly share code, notes, and snippets.