January 23, 08:12

Pre-trained BERT in PyTorch

github.com/huggingface/pytorch-pretrained-BERT

(1)

Model code here is just awesome.

Integrated DataParallel / DDP wrappers / FP16 wrappers also are awesome.

FP16 precision training from APEX just works (no idea about convergence though yet).

(2)

As for model weights - I cannot really tell, there is no dedicated Russian model.

The only problem I am facing now - using large embeddings bags batch size is literally 1-4 even for smaller models.

And training models with sentence piece is kind of feasible for rich languages, but you will always worry about generalization.

(3)

Did not try the generative pre-training (and sentence prediction pre-training), I hope that properly initializing embeddings will also work for a closed domain with a smaller model (they pre-train 4 days on 4+ TPUs, lol).

(5)

Why even tackle such models?

Chat / dialogue / machine comprehension models are complex / require one-off feature engineering.

Being able to tune something like BERT on publicly available benchmarks and then on your domain can provide a good way to embed complex situations (like questions in dialogues).

#nlp

#deep_learning

huggingface/pytorch-pretrained-BERT

📖The Big-&-Extending-Repository-of-Transformers: Pretrained PyTorch models for Google's BERT, OpenAI GPT & GPT-2, Google/CMU Transformer-XL. - huggingface/pytorch-pretrained-BERT