March 07, 15:42

Our experiments with Transformers, BERT and generative language pre-training


For morphologically rich languages pre-trained Transformers are not a silver bullet and from a layman's perspective they are not feasible unless someone invests huge computational resources into sub-word tokenization methods that work well + actually training these large networks.

On the other hand we have definitively shown that:

- Starting a transformer with Embedding bag initialized via FastText works and is relatively feasible;

- On complicated tasks - such transformer significantly outperforms training from scratch (as well as naive models) and shows decent results compared to state-of-the-art specialized models;

- Pre-training worked, but it overfitted more thatn FastText initialization and given the complexity required for such pre-training - it is not useful;

All in all this was a relatively large gamble, which did not pay off - on some more down-to-earth task we hoped the Transformer would excel at - it did not.


Complexity / generalization /computational cost in modern applied NLP for morphologically rich languages

Complexity / generalization /computational cost in modern applied NLP for morphologically rich languages. Towards a new state of the art? Статьи автора - Блог -

An approach to ranking search results with no annotation

Just a small article with a novel idea:

- Instead of training a network with CE - just train it with BCE;

- Source additional structure from the inner structure of your domain (tags, matrix decomposition methods, heuristics, etc);

Works best if your ontology is relatively simple.


Learning to rank search results without annotation

Solving search ranking problem Статьи автора - Блог -