January 31, 13:33

Forwarded from Анна:

Checked out sentence embeddings in LASER:

- installation guide is a bit messy

- works on FAISS lib, performance is pretty fast ( <1 minute to encode 250k sentences on 1080Ti)

- better generalization comparing to ft baseline. A difference is clear even for small sentences: 'добрый день!' and 'здравствуйте!' embeddings are much closer in LASER's space than in ft

- looks like LASER embeddings is more about similarity, not only substitutability and better in synonym's recognition

- seems to work better on short sentences