Playing with multi-GPU small batch-sizes
If you play with SemSeg with a big model with large images (HD, FullHD) - you may face a situation when only one image fits to one GPU.
Also this is useful if your train-test split is far from ideal and or you are using pre-trained imagenet encoders for a SemSeg task - so you cannot really update your bnorm params.
Also AFAIK - all the major deep-learning frameworks:
(0) do not have batch norm freeze options on evaluation (batch-norm contains 2 sets of parameters - learnable and updated on inference
(1) calculate batch-norm for each GPU separately
It all may mean, that your models may severely underperform in inference for these situations.
(0) Sync batch-norm. I believe to do it properly you will have to modify the framework you are using, but there is a PyTorch implementation done for the CVPR 2018 - also an explanation here hangzh.com/PyTorch-Encoding/note
(1) Use affine=False in your batch-norm. But probably in this case imagenet initialization will not help - you will have to train your model from scratch completely
(2) Freeze your encoder batch-norm params completely
(3) Use recent Facebook group norm - arxiv.org/pdf/1803.08494.pdf
This is a finicky topic - please tell in comments about your experiences and tests
Like this post or have something to say => tell us more in the comments or donate!
Since pytorch does not support syncBN, I hope to freeze mean/var of BN layer while trainning. Mean/Var in pretrained model are used while weight/bias are learnable. In this way, calculation of bottom_grad in BN will be different from that of the novel trainning mode. However, we do not find any flag in the function bellow to mark this difference. pytorch/torch/csrc/cudnn/BatchNorm.cpp void cudnn_batch_norm_backward( THCState* state, cudnnHandle_t handle, cudnnDataType_t dataType, THVo...