Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1319 members, 1513 posts since 2016

All this - lost like tears in rain.

Data science, deep learning, sometimes a bit of philosophy and math. No bs.

Our website
- spark-in.me
Our chat
- goo.gl/WRm93d
DS courses review
- goo.gl/5VGU5A
- goo.gl/YzVUKf

January 02, 04:45

My lazy bones remarks in fast.ai GAN lessons. Really good stuff though. Definitely a must read for GANs, but the actual code part and bcolz part can be done easier in pure pytorch with multiple workers.

Fast.ai part 2, lesson 3 - generative models

- Video www.youtube.com/watch?time_continue=3021&v=uv0gmrXSXVg

- Wiki forums.fast.ai/t/lesson-10-wiki/1937

- Forum forums.fast.ai/t/lesson-10-discussion/1807

- Code is shared here - github.com/fastai/courses/tree/master/deeplearning2

Key insights:

- Everything that works on whole imagenet is quite good

- Image2seq and super-resolution work awesomely well

- Generative models <> GANs, GANs usually can be added to any generative model

- tqdm - is the best progress bar ever

- Keras + Jupiter is more or less quicker for experimenting than Pytorch, but less flexible

overall and much less production ready

Bcolz and Img2seq

- their notebook - goo.gl/JhYpPx

- bcolz iterator for large datasets if the dataset does not fit in memory - goo.gl/ck1P2d

-- allows iterating over bcolz stored on the disk

-- I see no real benefit over just iterating files in several threads

-- Maybe it would be just better to work with in-memory bcolz arrays (say bcols boasts in-memory compression + large RAM will be a good solution)?

-- for my taste this is overengineering - simple multi-threaded pillow thumbnail + milti-threaded dataset class would do the job (though may be useful for a more general application or for much more data - terabytes)

-- as for video and terabytes of data another approach works - just itemize the data (1 item = .npy file) and then just use threaded workers to read it

- fast.ai examples are cool, they know about Pillow SIMD, but do not know about pillow's thumbnail method

- you can use gensim to easily get word2vec vectors

- Pillow SIM is 600% faster Pillow - github.com/uploadcare/pillow-simd

Distance

- cosine distance is a usual choice for high dimension spaces

Super-resolution

- notebook - goo.gl/tASGjU

- superresolution with FCN

- all imagenet

- how to write your one train loops in keras

GANs (worthy tips, paper review will come later)

- their notebook - goo.gl/1QZbcy

- NN can learn to ignore black borders, but better just avoid them

- for ordinary GANs loss functions do not make sense

- write your own generator

- train a D a "little bit" at first

- freeze D and unfreeze G, train generator (G) with frozen discriminator (D), freeze G and

unfreeze D, train D with frozen G, repeat

- Wasserstein GAN paper - is a MASSIVE break-through

- D vs G batches is flexible (as per paper)

- For WG training curves make sense

Pytorch hacks

- underscore operators

save memory

- pre-allocate memory

- saves time

- good weight initilization boilerplate (I mostly used Imagenet, so I avoided facing it)

def weights_init(m):

if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d)):

m.weight.data.normal_(0.0, 0.02)

elif isinstance(m, nn.BatchNorm2d):

m.weight.data.normal_(1.0, 0.02)

m.bias.data.fill_(0)

netG.apply(weights_init)

#deep_learning

Lesson 10: Cutting Edge Deep Learning for Coders
A surprising result in deep learning is that models created from totally different types of data, such as text and images, can learn to share a consistent fe...