Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1319 members, 1513 posts since 2016

All this - lost like tears in rain.

Data science, deep learning, sometimes a bit of philosophy and math. No bs.

Our website
Our chat
DS courses review

Posts by tag «deep_learning»:

snakers4 (Alexander), July 15, 08:54

Sometimes in supervised ML tasks leveraging the data sctructure in a self-supervised fashion really helps!

Playing with CrowdAI mapping competition

In my opinion it is a good test-ground for testing your ideas with SemSeg - as the dataset is really clean and balanced




Playing with Crowd-AI mapping challenge - or how to improve your CNN performance with self-supervised techniques

In this article I tell about a couple of neat optimizations / tricks / useful ideas that can be applied to many SemSeg / ML tasks Статьи автора - Блог -

snakers4 (spark_comment_bot), July 13, 05:22

2018 DS/ML digest 17

Highlights of the week

(0) Troubling trends with ML scholars

(1) NLP close to its ImageNet stage?

Papers / posts / articles

(0) Working with multi-modal data

- concatenation-based conditioning

- conditional biasing or scaling ("residual" connections)

- sigmoidal gating

- all in all this approach seems like a mixture of attention / gating for multi-modal problems

(1) Glow, a reversible generative model which uses invertible 1x1 convolutions

(2) Facebooks moonshots - I kind of do not understand much here


(3) RL concept flaws?


(4) Intriguing failures of convolutions - this is fucking amazing

(5) People are only STARTING to apply ML to reasoning

Yet another online book on Deep Learning

(1) Kind of standard!/book/grokking-deep-learning/chapter-1/v-10/1

Libraries / code

(0) Data version control continues to develop




Like this post or have something to say => tell us more in the comments or donate!

Troubling Trends in Machine Learning Scholarship

By Zachary C. Lipton* & Jacob Steinhardt* *equal authorship Originally presented at ICML 2018: Machine

snakers4 (Alexander), July 11, 06:51

TF 1.9

Funnily enough, they call Keras not "Keras with TF back-end", but "tf.keras"




tensorflow - Computation using data flow graphs for scalable machine learning

snakers4 (Alexander), July 09, 09:04

2018 DS/ML digest 16

Papers / posts

(0) RL now solves Quake

(1) A post about AdamW

-- Adam generally requires more regularization than SGD, so be sure to adjust your regularization hyper-parameters when switching from SGD to Adam

-- Amsgrad turns out to be very disappointing

-- Refresher article

(2) How to tackle new classes in CV

(3) A new word in GANs?



(4) Using deep learning representations for search


-- library for fast search on python

(5) One more paper on GAN convergence

(6) Switchable normalization - adds a bit to ResNet50 + pre-trained models


(0) Disney starts to release datasets

Market / interesting links

(0) A motion to open-source GitHub

(1) Allegedly GTX 1180 start in sales appearing in Asia (?)

(2) Some controversy regarding Andrew Ng and self-driving cars

(3) National AI strategies overviewed -

-- Canada C$135m

-- China has the largest strategy

-- Notably - countries like Finland also have one

(4) Amazon allegedly sells face recognition to the USA



Google’s DeepMind taught AI teamwork by playing Quake III Arena

Google’s DeepMind today shared the results of training multiple AI systems to play Capture the Flag on Quake III Arena, a multiplayer first-person shooter game. The AI played nearly 450,000 g…

snakers4 (spark_comment_bot), July 07, 12:29

Playing with VAEs and their practical use

So, I played a bit with Variational Auto Encoders (VAE) and wrote a small blog post on this topic

Please like, share and repost!



Like this post or have something to say => tell us more in the comments or donate!

Playing with Variational Auto Encoders - PCA vs. UMAP vs. VAE on FMNIST / MNIST

In this article I thoroughly compare the performance of VAE / PCA / UMAP embeddings on a simplistic domain - UMAP Статьи автора - Блог -

snakers4 (Alexander), July 04, 07:57

2018 DS/ML digest 15

What I filtered through this time

Market / news

(0) Letters by big company employees against using ML for weapons

- Microsoft

- Amazon

(1) Facebook open sources Dense Pose (eseentially this is Mask-RCNN)


Papers / posts / NLP

(0) One more blog post about text / sentence embeddings

- key idea different weighting

(1) One more sentence embedding calculation method

- ?

(2) Posts explaing NLP embeddings

- - some basics - SVD / Word2Vec / GloVe

-- SVD improves embedding quality (as compared to ohe)?

-- use log-weighting, use TF-IDF weighting (the above weighting)

- - word embedding properties

-- dimensions vs. embedding quality

(3) Spacy + Cython = 100x speed boost - - good to know about this as a last resort

- described use-case

you are pre-processing a large training set for a DeepLearning framework like pyTorch/TensorFlow

or you have a heavy processing logic in your DeepLearning batch loader that slows down your training

(4) Once again stumbled upon this -

(5) Papers

- Simple NLP embedding baseline

- NLP decathlon for question answering

- Debiasing embeddings

- Once again transfer learning in NLP by open-AI -




Download full.pdf 0.04 MB

snakers4 (Alexander), July 04, 05:12

Open Images Object detection on Kaggle


- Key ideas

-- 1.2 images, high-res, 500 classes

-- decent prizes, but short time-span (2 months)

-- object detection


Google AI Open Images - Object Detection Track

Detect objects in varied and complex images.

snakers4 (Alexander), July 03, 07:15

A cool article from Ben Evans about how to think about ML

Ways to think about machine learning

We're now four or five years into the current explosion of machine learning, and pretty much everyone has heard of it, and every big company is working on projects around ‘AI’. We know this is a Next Big Thing. I don't think, though, that we yet have a settled sense of quite what machine learning m

My recent PyTorch 0.4 Dockerfile for CV


My PyTorch 0.4 Dockerfile

snakers4 (Alexander), July 02, 04:51

2018 DS/ML digest 14

Amazing article - why you do not need ML


- I personally love plain-vanilla SQL and in 90% of cases people under-use it

- I even wrote 90% of my JSON API on our blog in pure PostgreSQL xD

Practice / papers

(0) Interesting papers from CVPR

(1) Some down-to-earth obstacles to ML deploy

(2) Using synthetic data for CNNs (by Nvidia) -

(3) This puzzles me - so much effort and engineering spent on something ... strange and useless -

On paper they do a cool thing - investigate transfer learning between different domains, but in practice it is done on TF and there is no clear conclusion of any kind

(4) VAE + real datasets - only small Imagenet (64x64)

(5) Understanding the speed of models deployed on mobile -

(6) A brief overview of multi-modal methods

Visualizations / explanations

(0) Amazing website with ML explanations

(1) PCA and linear VAEs are close




No, you don't need ML/AI. You need SQL

A while ago, I did a Twitter thread about the need to use traditional and existing tools to solve everyday business problems other than jumping on new buzzwords, sexy and often times complicated technologies.

snakers4 (Alexander), June 28, 15:22

Playing with PyTorch 0.4

It was released some time ago

If you are not aware - this is the best summary

My first-hand experiences

- Multi-GPU support works strangely

- If you just launch your 0.3 code it will work on 0.4 with warnings - not a really breaking change

- All the new features are really cool, useful and make using PyTorch even more delightful

- I especially liked how they added context managers and cleaned up the device mess


snakers4 (Alexander), June 28, 11:18

DL Framework choice - 2018

If you are still new to DL / DS / ML and have not yet chosen your framework, consider reading this before proceeding



snakers4 (Alexander), June 28, 07:43

2018 DS/ML digest 13

Blog posts / articles:

(0) Google notes on CNN generalization -

(1) Google to teaching robots in virtual environment and then trasferring models to reality -

(2) Google's object tracking via image colorization -

(2) Interesting articles about VAEs:

- A small intro into VAEs

- A small intuitive intro (super super cool and intuitive)

- KL divergence explained

- A more formal write-up

- In (RU)

- Converting a FC layer into a conv layer

- A post by Fchollet

A good in-depth write-up on object detection:


- finally a decent explanation of YOLO parametrization[email protected]

- best comparison of YOLO and SSD ever -[email protected]

Papers with interesting abstracts (just good to know sich things exist)

- Low-bit CNNs -

- Automated Meta ML -

- Idea - use ResNet blocks for boosting -

- 2D-discrete-Fourier transform (2D-DFT) to encode rotational invariance in neural networks -

- Smallify the CNNs -

- BLEU review as a metric - conclusion - it is good on average to measure MT performance -

"New" ideas in SemSeg:

- UNET + conditional VAE

- Dilated convolutions for larget satellite images - looks like that this works only if you have high resolution with small objects



How Can Neural Network Similarity Help Us Understand Training and Generalization?

Posted by Maithra Raghu, Google Brain Team and Ari S. Morcos, DeepMind In order to solve tasks, deep neural networks (DNNs) progressively...

snakers4 (Alexander), June 26, 07:02

If someone needs a dataset, Kaggle launched ImageNet object detection


There is an open images dataset, which I guess is bigger though


ImageNet Object Localization Challenge

Identify the objects in images

snakers4 (Alexander), June 25, 10:53

A subscriber sent a really decent CS university scientific ranking

Useful, if you want to apply for CS/ML based Ph.D. there


Transformer in PyTorch

Looks like somebody implement recent Google's transformer fine-tuning in PyTorch





pytorch-openai-transformer-lm - A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

snakers4 (spark_comment_bot), June 21, 14:13

Playing with multi-GPU small batch-sizes

If you play with SemSeg with a big model with large images (HD, FullHD) - you may face a situation when only one image fits to one GPU.

Also this is useful if your train-test split is far from ideal and or you are using pre-trained imagenet encoders for a SemSeg task - so you cannot really update your bnorm params.

Also AFAIK - all the major deep-learning frameworks:

(0) do not have batch norm freeze options on evaluation (batch-norm contains 2 sets of parameters - learnable and updated on inference

(1) calculate batch-norm for each GPU separately

It all may mean, that your models may severely underperform in inference for these situations.


(0) Sync batch-norm. I believe to do it properly you will have to modify the framework you are using, but there is a PyTorch implementation done for the CVPR 2018 - also an explanation here - I guess if its multi-GPU wrappers for model can be used for any models - then we are in the money)

(1) Use affine=False in your batch-norm. But probably in this case imagenet initialization will not help - you will have to train your model from scratch completely

(2) Freeze your encoder batch-norm params completely (though I am not sure - they do not seem to be freezing the running mean parameters) - probably this also needs m.trainable = False or something like this

(3) Use recent Facebook group norm -

This is a finicky topic - please tell in comments about your experiences and tests



Like this post or have something to say => tell us more in the comments or donate!

How to train with frozen BatchNorm?

Since pytorch does not support syncBN, I hope to freeze mean/var of BN layer while trainning. Mean/Var in pretrained model are used while weight/bias are learnable. In this way, calculation of bottom_grad in BN will be different from that of the novel trainning mode. However, we do not find any flag in the function bellow to mark this difference. pytorch/torch/csrc/cudnn/BatchNorm.cpp void cudnn_batch_norm_backward( THCState* state, cudnnHandle_t handle, cudnnDataType_t dataType, THVo...

snakers4 (Alexander), June 10, 15:35

And now the article is also live -

Please support us with your likes!



Состязательные атаки (adversarial attacks) в соревновании Machines Can See 2018

Или как я оказался в команде победителей соревнования Machines Can See 2018 adversarial competition. Суть любых состязательных атак на примере. Так уж...

snakers4 (Alexander), June 10, 06:50

An interesting idea from a CV conference

Imagine that you have some kind of algorithm, that is not exactly differentiable, but is "back-propable".

In this case you can have very convoluted logic in your "forward" statement (essentially something in between trees and dynamic programming) - for example a set of clever if-statements.

In this case you will be able to share both of the 2 worlds - both your algorithm (you will have to re-implement in your framework) and backprop + CNN. Nice.

Ofc this works only for dynamic deep-learning frameworks.



Machines Can See 2018 adversarial competition

Happened to join forces with a team that won 2nd place in this competition


It was very entertaining and a new domain to me.

Read more materials:

- Our repo

- Our presentation

- All presentations




Playing with adversarial attacks on Machines Can See 2018 competition

This article is about MCS 2018 competition and my participation in it, adversarial attack methods and how out team won Статьи автора - Блог -

snakers4 (spark_comment_bot), June 06, 07:55

2018 DS/ML digest 11



New Andrew Ng paper on radiology datasets

YouTube 8M Dataset post

As mentioned before - this is more or less blatant TF marketing

New papers / models / architectures

(0) Google RL search for optimal augmentations

- Blog, paper

- Finally Google paid attention to augmentations

- 83.54% top1 accuracy on ImageNet

- Discrete search problem, each policy consists of 5 sub-policies each each operation associated with two hyperparameters: probability and magnitude

- Training regime cosine decay for 200 epochs

- Top accuracy on ImageNet

- Best policy

- Typical examples of augmentations


Training CNNs with less data

Key idea - with clever selection of data you can decrease annotation costs 2-3x


Regularized Evolution for Image Classifier Architecture Search (AmoebaNet)

- The first controlled comparison of the two search algorithms (genetic and RL)

- Mobile-size ImageNet (top-1 accuracy = 75.1% with 5.1M parameters)

- ImageNet (top-1 accuracy = 83.1%)

Evolution vs. RL at Large-Compute Scale

• Evolution and RL do equally well on accuracy

• Both are significantly better than Random Search

• Evolution is faster

But the proper description of the architecture is nowhere to be seen...

Libraries / code / frameworks

(0) OpenCV installation for Ubuntu18 from source (if you need e.g. video support)

News / market

(0) Idea adversarial filters for apps -

(1) A list of 30 best practices for amateur ML / DL specialits -

- Some ideas about tackling naive NLP problems

- PyTorch allegedly supports just freezing bn layers

- Also a neat idea I tried with inception nets - assign different learning rates to larger models when fine-tuning them

(2) Stumbled upon a reference on NAdam as optimizer as being a bit better than Adam

It is also described in this popular article

(3) Barcode reader via OpenCV



Like this post or have something to say => tell us more in the comments or donate!

snakers4 (Alexander), June 05, 14:42

A very useful combination in tmux

You can resize your panes via pressing

- first ctrl+b

- hold ctrl

- press arrow keys several time holding ctrl


- profit



Digest about Internet

(0) Ben Evans Internet digest -

(1) GitHub purchased by Microsoft -

-- If you want to migrate - there are guides already -

(2) And a post on how Microsoft kind of ruined Skype -

-- focus on b2b

--lack of focus, constant redesigns, faltering service

(3) No drop in FB usage after its controversies -

(4) Facebook allegedly employes 1200 moderators for Germany -

(5) Looks like many Linux networking tools have been outdated for years



snakers4 (Alexander), May 31, 07:25

New cool papers on CNNs

(0) Do Better ImageNet Models Transfer Better?

An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks.

However, this hypothesis has never been systematically tested.

- Wow an empiric study why ResNets rule - they are just better non-finetuned feature extractors and then are probably easier to fine-tune

- ResNets are the best fixed feature extractors

- Also ImageNet pretraining accelerates convergence

- Also my note is that inception-based models are more difficult to fine-tune.

- Among top ranking models are - Inception, NasNet, AmoebaNet

- Also my personal remark - any CNN architecture can be ft-ed to be relatively good, you just need to invent a proper training regime

Just the abstract says it all

Here, we compare the performance of 13 classification models on 12 image classification tasks in three settings: as fixed feature extractors, fine-tuned, and trained from random initialization. We find that, when networks are used as fixed feature extractors, ImageNet accuracy is only weakly predictive of accuracy on other tasks (r2 = 0.24). In this setting, ResNets consistently outperform networks that achieve higher accuracy on ImageNet. When networks are fine-tuned, we observe a substantially stronger correlation (r2 = 0.86). We achieve state-of-the-art performance on eight image classification tasks simply by fine-tuning state-of-the-art ImageNet architectures, outperforming previous results based on specialized methods for transfer learning.

(1) Shampoo: Preconditioned Stochastic Tensor Optimization

Looks really cool - but their implementation requires SVD and is slow for real tasks

Also they tested it only on toy tasks

In real application PyTorch implementation takes 175.58s/it per batch



shampoo.pytorch - An implementation of shampoo

snakers4 (Alexander), May 31, 06:55

Some insights about why recent TF speech recognition challenge dataset was so poor in quality:


Cool ideas

+ a cool idea - use last layer in CNN as an embedding in TB visualization + how to


Why you need to improve your training data, and how to do it

Photo by Lisha Li Andrej Karpathy showed this slide as part of his talk at Train AI and I loved it! It captures the difference between deep learning research and production perfectly. Academic pape…

snakers4 (Alexander), May 30, 05:36

Transforms in PyTorch

The added a lot of useful stuff lately:


Basically this enables to build a decent pre-processing out-of box for simple tasks (just images)

I believe it will be much slower that OpenCV, but for small tasks it's ideal, if you do no look under the hood



New light-weight architecture from Google with 72%+ top1




Pre-trained implementation


- but this one took much more memory that I expected

- did not debug it


Gist - new light-weight architecture from Google with 72%+ top1 on Imagenet

Ofc Google promotes only its own papers there

No mention of SqueezeNet

This is somewhat disturbing


Novel ideas

- the shortcut connections are between the thin bottleneck layers

- the intermediate expansion layer uses lightweight depthwise convolutions

- it is important to remove non-linearities in the narrow layers in order to maintain representational power


Very novel idea - it is argued that non-linearities collapse some information.

When the dimensionality of useful information is low, you can do w/o them w/o loss of accuracy

(4) Building blocks

- Recent small networks' key features (except for SqueezeNet ones) -

- MobileNet building block explanation


- Overall architecture -


snakers4 (spark_comment_bot), May 28, 08:09

A couple of neat tricks in PyTorch to make code more compact and more useful for hyper-param tuning

You may have seen that today one can use CNNs even for tabular data.

In this case you may to resort to a lot of fiddling regarding model capacity and hyper-params.

It is kind of easy to do so in Keras, but doing this in PyTorch requires a bit more fiddling.

Here are a couple of patterns that may help with this:

(0) Clever use of nn.Sequential()

self.layers = nn.Sequential(*[






for _ in range(blocks)


(1) Clever use of lists (which is essentially the same as above)

Just this construction may save a lot of space and give a lot of flexibility

modules = []


self.classifier = nn.Sequential(*modules)

(2) Pushing as many hyper-params into flags for console scripts

You can even encode something like 1024_512_256 to be passed as list to your model constructor, i.e.

1024_512_256 => 1024,512,256 => an MLP with corresponding amount of neurons

(3) (Obvious) Using OOP where it makes sense

Example I recently used for one baseline


Like this post or have something to say => tell us more in the comments or donate!

Playing with MLP + embeddings in PyTorch

snakers4 (Alexander), May 25, 07:29

New competitions on Kaggle

Kaggle has started a new competition with video ... which is one of those competitions (read between the lines - blatant marketing)


- TensorFlow Record files

- Each of the top 5 ranked teams will receive $5,000 per team as a travel award - no real prizes

- The complete frame-level features take about 1.53TB of space (and yes, these are not videos, but extracted CNN features)

So, they are indeed using their platform to promote their business interests.

Released free datasets are really cool, but only when you can use then for transfer learning, which implies also seeing the underlying ground level data (i.e. images of videos).



The 2nd YouTube-8M Video Understanding Challenge

Can you create a constrained-size model to predict video labels?

snakers4 (Alexander), May 07, 18:09 publishing​ a 2018 version of cutting edge course

Their materials are cool, but their library is questionable


snakers4 (Alexander), May 04, 06:59

The current state of ML

(1) Do not call it AI

(2) Distinguish ML from Intelligent Infrastructure and Intelligence Augmentation

(3) Human-imitative AI is not tractable now

(4) Developments which are now being called "AI" arose mostly in the engineering fields associated with low-level pattern recognition and movement control


Artificial Intelligence — The Revolution Hasn’t Happened Yet

Artificial Intelligence (AI) is the mantra of the current era. The phrase is intoned by technologists, academicians, journalists and…

snakers4 (Alexander), May 01, 16:52

2018 DS/ML digest 9

Market / libraries

(0) Tensorflow + Swift - wtf -

(1) Geektimes / going international -

(2) A service for renting GPUs ... from people

- Reddit

- Link

- Looks LXC based (afaik - the only user friendly alternative to Docker)

- Cool in theory, no idea how secure this is - we can assume as secure as providing a docker container to stranger

- They did not reply me in a week

(3) A friend sent me a new list of ... new yet another PyTorch NLP libraries

-, (AllenNLP is the biggest library like this)

- I believe that such libraries are more or less useless for real tasks, but cool to know they exist

(4) New SpaceNet 4?

(5) A new super cool competition on Kaggle about particle physics?

Tutorials / basics

(0) Bias vs. Variance (RU)

(1) Yet another magic Jupyter guideline collection -

Real world ML applications

(0) Resnet + object detection (RU) - people wo helmets 90% accuracy -

(1) about using embeddings with Tabular data -

Very similar to our approach on electricity

I personally do not recommend using their library by all means

(2) Comparing Google TPU vs. V100 with ResNet50 -

- speed -

- pricing -

- but ... buying GPUs is much cheaper

(3) Other blog posts about embeddings + tabular data

- Sales prediction

- Taxi drive prediction

MLP + classification + embeddings - /

(4) Albu's solution to SpaceNet - augmentations

CNN overview

Neural network part:

Split data to 4 folds randomly but the same number of each city tiles in every fold

Use resnet34 as encoder and unet-like decoder (conv-relu-upsample-conv-relu) with skip connection from every layer of network. Loss function: 0.8*binary_cross_entropy + 0.2*(1 – dice_coeff). Optimizer – Adam with default params.

Train on image crops 512*512 with batch size 11 for 30 epoch (8 times more images in one epoch)

Train 20 epochs with lr 1e-4

Train 5 epochs with lr 2e-5

Train 5 epochs with lr 4e-6

Predict on full image with padding 22 on borders (1344*1344).

Merge folds by mean

Jobs / job market

(0) Developers by country by scraping GitHub -

- developers count vs. GDP R^2 = 84%

- developers count vs. population - R^2 = 50%


(0) Interactive tool for visualizing convolutions -


(0) Open Images v4 outsourced


- the dataset itself

- categories





swift - Swift for TensorFlow documentation repository.

snakers4 (Alexander), May 01, 07:53

Exploring GANs and unsupervised learning

Here are my findings from my hobby project about using GANs and unsupervised methods to build some decent semantic search on a large dataset of images without annotation:


Lots of cool images.


(0) Features from pre-trained Imagenet encoder => PCA => Umap => HDBSCAN work really well for image clusterization;

(1) Any siamese network / hard negative mining inspired methods just did not work - the annotation data is too coarse;

(2) GANs kind of work, but I could not achieve the boasted photo-realistic levels;


Exploring the limits of unsupervised Machine Learning in Computer Vision

In this article I share my experience with GANs, progressive growing of GANs, image clustering and unsupervised learning Статьи автора - Блог -

snakers4 (Alexander), May 01, 06:58

Showing more images in Tensorboard

TB is super cool (also in together with script, but it shows ~10 images in its image preview.

This can be fixed.

(0) Find your TB folder

import tensorboard

tensorboard.__file__In my case it shows '/opt/conda/lib/python3.6/site-packages/tensorboard/'


cd there

open backend/


Change this line

image_metadata.PLUGIN_NAME: 400,(3)

Profit - now it shows ~400 images on each view tab


Logging to tensorboard without tensorflow operations. Uses manually generated summaries instead of summary ops

snakers4 (Alexander), April 29, 18:32

Downgrading PyTorch from 0.4 to 0.3

Newest PyTorch has some issues with regards to multi-GPU operation.

If you want to install the previous version, the downgrade docs are a bit outdated, but you can simply:

conda install pytorch=0.3.0 cuda90 -c pytorch