Miniaturize / optimize your ... NLP models?
For CV applications there literally dozens of ways to make your models smaller.
And yeah, I do not mean some "moonshots" or special limited libraries (matrix decompositions, some custom pruning, etc etc).
I mean cheap and dirty hacks, that work in 95% of cases regardless of your stack / device / framework:
- Smaller images (x3-x4 easy);
- FP16 inference (30-40% maybe);
- Knowledge distillation into smaller networks (x3-x10);
- Naïve cascade optimizations (feed only Nth frame using some heuristic);
But what can you do with NLP networks?
Turns out not much.
But here are my ideas:
- Use a simpler model - embedding bag + plain self-attention + LSTM can solve 90% of tasks;
- Decrease embedding size from 300 to 50 (or maybe even more). Tried and tested, works like a charm. For harder tasks you lose just 1-3pp of your target metric, for smaller tasks - it is just the same;
- FP16 inference is supported in PyTorch for
nn.Embedding, but not for
nn.EmbeddingBag. But you get the idea;
- You can try distilling your vocabulary / embedding-bag model into a char level model. If it works, you can trade model size vs. inference time;
_embedding_bag is not implemented for type torch.HalfTensor
- If you have very long sentences or large batches - try distilling / swapping your recurrent network with a CNN / TCN. This way you can also trade model size vs. inference time but probably in a different direction;