Interesting hack from n01z3 from ODS
For getting that extra 1%
Snapshot Ensembles / Multi-checkpoint TTA:
- Train CNN with LR decay until convergence, use SGD or Adam
- Use cyclic LR starting to train the network from the best checkpoint, train for several epochs
- Collect checkpoints with the best loss and use them for ensembles / TTA
Google TPUs are released in beta..US$200 per day?
No thank you! Also looks like only TF is supported so far.
Combined with rumours, sounds impractical.