Poor man's ensembling techniques
So you want to improve your model's performance a bit.
Ensembling helps. But as is ... it's useful only on Kaggle competitions, where people stack over9000 networks trained on 100MB of data.
But for real life usage / production, there exist ensembling techniques, that do not require significant computation cost increase (!).
All of this is not mainstream yet, but it may work on you dataset!
Especially if your task is easy and the dataset is small.
- SWA (proven to work, usually used as a last stage when training a model);
- Lookahead optimizer (kind of new, not thoroughly tested);
- Multi-Sample Dropout (seems like a cheap ensemble, should work for classification);
Applicability will vary with your task.
Plain vanilla classification can use all of these, s2s networks probably only partially.