November 09, 2018

Fast-text trained on a random mix of Russian Wikipedia / Taiga / Common Crawl

On our benchmarks was marginally better than fast-text trained on Araneum from Rusvectors.

Download link

goo.gl/g6HmLU

Params

Standard params - (3,6) n-grams + vector dimensionality is 300.

Usage:

import fastText as ft
ft_model_big = ft.load_model('model')
And then just refer to

github.com/facebookresearch/fastText/blob/master/python/fastText/FastText.py

#nlp