Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1234 members, 1316 posts since 2016

All this - lost like tears in rain.

Internet, data science, math, deep learning, philosophy. No bs.

Our website
Our chat
DS courses review

Posts by tag «deep_learning»:

snakers4 (Alexander), February 19, 10:25

One more article about usual suspects when your CNN fails to train



37 Reasons why your Neural Network is not working

The network had been training for the last 12 hours. It all looked good: the gradients were flowing and the loss was decreasing. But then…

snakers4 (Alexander), February 18, 08:05

Even though I am preparing a large release on GAN application on real example, I just could not help sharing these 2 links.

They are just an absolute of perfection for GANs on PyTroch



Also this is the most idiomatic PyTorch code (Imagenet finetuning) code I have ever seen


So if you are new to PyTorch, then these links will be very useful)





Contribute to WassersteinGAN development by creating an account on GitHub.

Which of the latest projects did you like the most?

Still waiting for GANs – 16

👍👍👍👍👍👍👍 48%

Satellites! – 9

👍👍👍👍 27%

Nothing / not interested / missed them – 4

👍👍 12%

Jungle! – 3

👍 9%

PM me for other options – 1

▫️ 3%

👥 33 people voted so far.

snakers4 (Alexander), February 14, 11:48

2017 DS/ML digest 4

Applied cool stuff

- How Dropbox build their OCR - via CTC loss -

Fun stuff

- CNN forward pass done in Google Sheets -

- New Boston Robotics robot - opens doors now -

- Cool but toothless list of jupyter notebooks with illustrations and models

- Best CNN filter visualization tool ever -

New directions / moonshots / papers

- IMPALA from Google - DMLab-30, a set of new tasks that span a large variety of challenges in a visually unified environment with a common action space



- Trade crypto via RL -

- SparseNets? -

- Use Apple watch data to predict diseases

- Google - Evolution in auto ML kicks in faster than RL -

- R-CNN for human pose estimation + dataset

-- Website + video

-- Paper

Google's Colaboratory gives free GPUs?

- Old GPUs

- 12 hours limit, but very cool in theory



Sick sad world

- China has police Google Glass with face recognition

- Why slack sucks -

-- Email + google docs is better for real communication


- Globally there are 22k ML developers

- One more AI chip moonshot -

- Google made their TPUs public in beta - US$6 per hour

- CNN performance comparable to human level in dermatology (R-CNN) -

- Deep learning is greedy, brittle, opaque, and shallow

- One more medical ML investment - US$25m for cancer -




snakers4 (Alexander), February 14, 04:54

Article on SpaceNet Challenge Three in Russian on habrhabr - please support us with your comments / upvotes


Also if you missed:

- The original article

- The original code release

... and Jeremy Howard from retweeted our solution, lol



But to give some idea which pain the TopCoder platform induces on the contestants, you can read

- Data Download guide

- Final testing guide

- Code release for their verification process




Из спутниковых снимков в графы (cоревнование SpaceNet Road Detector) — попадание топ-10 и код (перевод)

Привет, Хабр! Представляю вам перевод статьи. Это Вегас с предоставленной разметкой, тестовым датасетом и вероятно белые квадраты — это отложенная валидация...

snakers4 (Alexander), February 13, 07:55

Interesting hack from n01z3 from ODS

For getting that extra 1%

Snapshot Ensembles / Multi-checkpoint TTA:


- Train CNN with LR decay until convergence, use SGD or Adam

- Use cyclic LR starting to train the network from the best checkpoint, train for several epochs

- Collect checkpoints with the best loss and use them for ensembles / TTA


Google TPUs are released in beta..US$200 per day?

No thank you! Also looks like only TF is supported so far.

Combined with rumours, sounds impractical.



Cloud TPU machine learning accelerators now available in beta

By John Barrus, Product Manager for Cloud TPUs, Google Cloud and Zak Stone, Product Manager for TensorFlow and Cloud TPUs, Google Brain Team...

snakers4 (Alexander), February 10, 07:13

So we started publishing articles / code / solutions to the recent SpaceNet3 challenge. A Russian article on will also be published soon.

- The original article

- The original code release

... and Jeremy Howard from retweeted our solution, lol



But to give some idea which pain the TopCoder platform induces on the contestants, you can read

- Data Download guide

- Final testing guide

- Code release for their verification process




How we participated in SpaceNet three Road Detector challenge

This article tells about our SpaceNet Challenge participation, semantic segmentation in general and transforming masks into graphs Статьи автора - Блог -

snakers4 (Alexander), February 08, 09:13 lesson 11 notes:

- Links

-- Video


- Semantic embeddings + imagenet can be powerful, but not deployable per se

- Training nets on smaller images usually works

- Comparing activation functions

- lr annealing

- linear learnable colour swap trick

- adding Batchnorm

- replacing max-pooling with avg_pooling

- lr vs batch-size

- dealing with noisy labels

- FC / max-pooling layer models are better for transfer-learning?

- size vs. flops vs. speed

- cyclical learning rate paper

- Some nice intuitions about mean shift clustering





Lesson 11: Cutting Edge Deep Learning for Coders
We’ve covered a lot of different architectures, training algorithms, and all kinds of other CNN tricks during this course—so you might be wondering: what sho...

Meta research on the CNNs

(also this amazing post

An Analysis of Deep Neural Network Models for Practical Applications

Key findings:

(1) power consumption is independent of batch size and architecture;

(2) accuracy and inference time are in a hyperbolic relationship;

(3) energy constraint = upper bound on the maximum achievable accuracy and model complexity;

(4) the number of operations is a reliable estimate of the inference time


- Accuracy and param number -

- Param efficiency -

Also a summary of architectural patterns



Deep Learning Scaling is Predictable, Empirically




- various empirical learning curves show robust power-law region

- new architectures slightly shift learning curves downwards

- model architecture exploration should be feasible with small training data sets

- it can be difficult to ensure that training data is large enough to see the power-law learning curve region

- irreducible error region

- each new hardware generation with improved FLOP rate can pro- vide a predictable step function improvement in relative DL model accuracy



Neural Network Architectures

Deep neural networks and Deep Learning are powerful and popular algorithms. And a lot of their success lays in the careful design of the…

snakers4 (Alexander), February 07, 14:09

Following our blog post, we also posted a Russian translation of the Jungle competition to habrhabr




Соревнование Pri-matrix Factorization на DrivenData с 1ТБ данных — как мы заняли 3 место (перевод)

Привет, Хабр! Представляю вашему вниманию перевод статьи "Animal detection in the jungle — 1TB+ of data, 90%+ accuracy and 3rd place in the competition". Или...

snakers4 (Alexander), February 06, 05:23

We are starting to publish our code / solutions / articles from recent competitions (Jungle and SpaceNet three).

This time the code will be more polished / idiomatic, so that you can learn something from it!

Jungle competition

- Finally it was verified that we indeed won the 3rd place)


Blog posts



- An adaptation for will be coming soon

Code release and architecture:

- Code

- Architecture

-- 1st place (kudos to Dmytro) - simple and nice

-- Ours

-- 2nd place - 4-5 levels of stacking

Please comment under posts / share / buy us a coffee!

- Buy a coffee

- Rate our channel tg://resolve?domain=tchannelsbot&start=snakers4




Pri-matrix Factorization

Chimp&See has collected nearly 8,000 hours of footage reflecting chimpanzee habitats from camera traps across Africa. Your challenge is to build a model that identifies the wildlife in these videos.

snakers4 (Alexander), February 05, 14:57

We also managed to get into top-10 in SpaceNet3 Road Detection challenge


(Final confirmation awaits)

Here is a sneak peak of our solution


A blog post + repo will follow





Flowchart Maker & Online Diagram Software is a free online diagramming application and flowchart maker . You can use it to create UML, entity relationship, org charts, BPMN and BPM, database schema and networks. Also possible are telecommunication network, workflow, flowcharts, maps overlays and GIS, electronic circuit and social network diagrams.

snakers4 (Alexander), February 02, 04:58

A more concise alternative to nvidia-smi

watch --color -n1.0 gpustat --colorInstallation:

pip3 install gpustat

Also you can use python bindings for GPU drivers, but I managed to find only drivers for python2.



snakers4 (Alexander), February 01, 11:25

2017 DS/ML digest 2


- One more RL library (last year saw 1 or 2)

- Speech recognition from facebook -

- Even better speech generation than WaveNet - - I cannot tell computer apart

Industry (overdue news)

- Nvidia does not like it's consumer GPUs deployed in data centers

- Clarifai kills forevery

- Google search and gorillas vs. black people -

Blog posts

- Baidu - dataset size vs. accuracy (log-scale)




- New Youtube actions dataset -


Papers - current topic - meta learning / CNN optimization and tricks

- Systematic evaluation of CNN advances on the ImageNet




- Cyclical Learning Rates for Training Neural Networks





- Large batch => train Imagenet in 15 mins


- Practical analysis of CNNs





snakers4 (Alexander), January 30, 04:57

Simple Keras + web service deploy guidelines from FChollet + PyImageSearch




Also an engineer guy from our team told me that this architecture sucks on high loads because redis will require object serialization, which takes a lot of time for images. Native python process management works better.



snakers4 (Alexander), January 29, 07:29

Some nice boilerplate on neural style transfer



Neural Artistic Style Transfer: A Comprehensive Look

Spring Quarter of my freshman year, I took Stanford’s CS 231n course on Convolutional Neural Networks. My final project for the course…

snakers4 (Alexander), January 29, 06:00

Nice example of group convolutions via pytorch





pretrained-models.pytorch - Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.

snakers4 (Alexander), January 29, 04:45

Classic / basic CNN papers

Aggregated Residual Transformations for Deep Neural Networks (ResNeXt)

- Authors Xie Saining / Girshick Ross / Dollár Piotr / Tu Zhuowen / He Kaiming

- Link

- Resnet and VGG go deeper

- Inception nets go wider. Despite efficiency - they are hard to re-purpose and design

- key idea - add group convolutions to the residual block

- illustrations

-- basic building block

-- same block in terms of group convolutions

-- overall architecture

-- performance - - +1% vs resnet



snakers4 (Alexander), January 28, 11:50

Dockerfile update for CUDA9 - CUDNN7:


Hello world in PyTorch and tensorflow seem to be working.



Dockerfile update

snakers4 (Alexander), January 28, 09:20

What is amazing about tf and CUDA / CUDNN drivers - that documentation is not updated when newer versions are released - and they are always changing library file names which is annoying af.

Arguably Google and Nvidia are the richest companies from the whole DS stack - but their documentations is the worst of all the richest companies.

So if you are updating your docker container and libraries suddenly start producing weird errors - look for compatibility guidelines like this one -

Of course docs and release note will have no mention of this. Because Google.

Also docker hub contains all the versions of CUDA+CUDDNN packaged, which helps



Pytorch has all this embedded into their official repo list


Google, why do you make us suffer?


How to install Tensorflow 1.5.0 using official pip package | Python 3.6

Hello everyone. This is going to be a tutorial on how to install tensorflow using official pre-built pip packages. In this tutorial, we will look at how to install tensorflow 1.5.0 CPU and GPU both for Ubuntu as well as Windows OS.

snakers4 (Alexander), January 27, 07:21

Best link about convolution arithmetic



conv_arithmetic - A technical report on convolution arithmetic in the context of deep learning

snakers4 (Alexander), January 23, 17:23

Key / classic CNN papers


ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

- a small resnet-like network that uses pointwise separable covolutions and depthwise separable convolutions and a shuffle layer

- authors - Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun

- paper -

- key

-- on ARM devices 13x faster that Alexnet

-- lower top1 error than MobileNet at 40 MFLOPs

- comparable to small versions of NASNET

- 2 ideas

-- use depth-wise separable convolutions for 3x3 and 1x1 convolutions

-- use shuffle layer (flatten, transpose, resize back to original dimension)

- illustrations

-- shuffle idea -

-- building blocks -

-- vs. key architectures

-- vs. MobileNet

-- actual inference on mobile device -



snakers4 (Alexander), January 23, 06:51

A new interesting competition on topcoder


At least at first glance)



Pre- register now for KONICA MINOLTA Image Segmentation Challenge

This contest aims to create new image recognition technology that could detect abnormality of a product to be used for visual inspection purpose.

snakers4 (Alexander), January 23, 04:14

snakers4 (Alexander), January 21, 05:49

A list of nice to read articles (RU)

- Nice article about credit score competition -

- Feature engineering

- If you a hardware strapped - RPI + Movidius stick may work for inference better that just RPI -



Определение вероятности невозврата кредита

Пост с описанием решения конкурса на платформе SASCOMPETITIONS. Организаторы разрешили мне опубликовать код и описание логики решения, но по договору я передаю право на алгоритм и, возможно, по пер…

snakers4 (Alexander), January 20, 15:13

2017 DS/ML digest 1

Did not do digests quite for some time =)

1. Annual digests

1.1 Google Brain one - two


- Speech generation

- Speech recognition

- Auto ML



Posted before - but WildML 2017 summary is also awesome

2. Datasets

→ YouTube-8M ( >7 million YouTube → videos annotated with 4,716 different classes

→ YouTube-Bounding Boxes ( 5 million bounding boxes from 210,000 YouTube videos

→ Speech Commands Dataset ( thousands of speakers saying short command words

→ AudioSet ( 2 million 10-second → → YouTube clips labeled with 527 different sound events

→ Atomic Visual Actions (AVA) ( 210,000 action labels across 57,000 video clips

→ Open Images ( 9M creative-commons licensed images labeled with 6000 classes

→ Open Images with Bounding Boxes ( 1.2M bounding boxes for 600 classes

→ QuickDraw dataset (


Uber about genetic approach to neural networks -





The Google Brain Team — Looking Back on 2017 (Part 1 of 2)

Posted by Jeff Dean, Google Senior Fellow, on behalf of the entire Google Brain Team The Google Brain team works to advance the state of ...

snakers4 (Alexander), January 17, 09:25

Nice presentation to learn about Semantic Segmentation



В.Игловиков - о сегментации, Kaggle и вообще жизни
Слайды -

snakers4 (Alexander), January 10, 03:20

A 70% full GAN / style paper review:

- review

- TLDR -

Did not crack math in Wasserstein GAN though.

Also a friend focused on GANS for ~6 months. Below is the gist of his work:

- GANs are known to be notoriously difficult and tricky to train even with wasserstein loss

- The most photo-realistic papers use custom regularization techniques and very sophisticated training regimes

- Seemingly photo-realistic GANs (with progressive growing)

-- are tricky to train

-- require 2-3x time to train the GAN itself and additional 3-6x to use growing

- end result may be completely unpredictable despite all the efforts

- most GANs are not viable in production / mobile applications

- visually in practice they perform much WORSE than style transfer

Training TLDR trick

- Use DCGAN just for training latent space variables w/o any domain

- Use CycleGan + wasserstein loss for domain transfer

- Use growing for photo-realism

As for using them for latent space algebra - I will do this project this year.



GAN paper list and review

In this I list useful / influential GAN papers and papers related to sparse unsupervised data CNN training / latent space operations Статьи автора - Блог -

US$1 million prize US-citizen exclusive Kaggle challenge ... for just stacking Resnets?


America is fucked up bad...

Also notice the shake-up and top scores

- Public

- Private



snakers4 (Alexander), January 08, 06:47

A 2017 ML/DS year in review by some venerable / random authors:

- Proper year review by WildML (!!!) -

-- Includes a lot of links and proper materials

-- AlphaGo

-- Attention

-- RL and genetic algorithm renaissance

-- Pytorch - elephant in the room, TF and others


-- Medicine

-- GANs

If I had to summarize 2017 in one sentence, it would be the year of frameworks. Facebook made a big splash with PyTorch. Due to its dynamic graph construction similar to what Chainer offers, PyTorch received much love from researchers in Natural Language Processing, who regularly have to deal with dynamic and recurrent structures that hard to declare in a static graph frameworks such as Tensorflow.

Tensorflow had quite a run in 2017. Tensorflow 1.0 with a stable and backwards-compatible API was released in February. Currently, Tensorflow is at version 1.4.1. In addition to the main framework, several Tensorflow companion libraries were released, including Tensorflow Fold for dynamic computation graphs, Tensorflow Transform for data input pipelines, and DeepMind’s higher-level Sonnet library. The Tensorflow team also announced a new eager execution mode which works similar to PyTorch’s dynamic computation graphs.

In addition to Google and Facebook, many other companies jumped on the Machine Learning framework bandwagon:

- Apple announced its CoreML mobile machine learning library.

- A team at Uber released Pyro, a Deep Probabilistic Programming Language.

- Amazon announced Gluon, a higher-level API available in MXNet.

- Uber released details about its internal Michelangelo Machine Learning infrastructure platform.

- And because the number of framework is getting out of hand, Facebook and Microsoft announced the ONNX open format to share deep learning models across frameworks. For example, you may train your model in one framework, but then serve it in production in another one.- In Russian - - kind of meh review (source -

- Amazing 2017 article about global AI trends -

- Uber engineering highlights -




AI and Deep Learning in 2017 – A Year in Review

The year is coming to an end. I did not write nearly as much as I had planned to. But I’m hoping to change that next year, with more tutorials around Reinforcement Learning, Evolution, and Ba…

snakers4 (Alexander), January 04, 01:27

Starting my GAN paper review series ~40% in


Please comment / share / provide feedback.



GAN paper list and review

In this I list useful / influential GAN papers and papers related to sparse unsupervised data CNN training / latent space operations Статьи автора - Блог -

snakers4 (Alexander), January 03, 12:20

While reading GAN papers stumbled upon even creepier pix2pix cats



And these are my best cats

Forwarded from Alexander:

Forwarded from Alexander:
Forwarded from Alexander:

snakers4 (Alexander), January 02, 10:02

A nice repo with paper summaries (2-3 pages per paper)





Summaries of machine learning papers