August 30, 06:54

ML without train / val split

Yeah, I am not crazy. But probably this applies only to NLP.

Sometimes you just need your pipeline to be flexible enough to work with any possible "in the wild" data.

A cool and weird trick - if you can make your dataset so large that your model just MUST generalize to work on it, then you do not need a validation set.

If you sample data randomly and your data generator is good enough, each new batch is just random and can serve as validation.