September 14, 2018

Understanding the current SOTA NMT / NLP model - transformer

A list of articles that really help to do so:

- Understanding attention

- Annotated transformer

- Illustrated transformer

Playing with transformer in practice

This repo turned out to be really helpful

It features:

- Decent well encapsulated model and loss;

- Several head for different tasks;

- It works;

- Ofc their data-loading scheme is crappy and over-engineered;

My impressions on actually training the transformer model for classification:

- It works;

- It is high capacity;

- Inference time is ~`5x` higher than char-level or plain RNNs;

- It serves as a classifier as well as an LM;

- Capacity is enough to tackle most challenging tasks;

- It can be deployed on CPU for small texts (!);

- On smaller tasks there is no clear difference between plain RNNs and Transformer;


Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)

Translations: Chinese (Simplified), Korean Watch: MIT’s Deep Learning State of the Art lecture referencing this post May 25th update: New graphics (RNN animation, word embedding graph), color coding, elaborated on the final attention example. Note: The animations below are videos. Touch or hover on them (if you’re using a mouse) to get play controls so you can pause if needed. Sequence-to-sequence models are deep learning models that have achieved a lot of success in tasks like machine translation, text summarization, and image captioning. Google Translate started using such a model in production in late 2016. These models are explained in the two pioneering papers (Sutskever et al., 2014, Cho et al., 2014). I found, however, that understanding the model well enough to implement it requires unraveling a series of concepts that build on top of each other. I thought that a bunch of these ideas would be more accessible if expressed visually. That’s what I aim to do in this post. You’ll need some previous understanding…