Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1234 members, 1316 posts since 2016

All this - lost like tears in rain.

Internet, data science, math, deep learning, philosophy. No bs.

Our website
Our chat
DS courses review

snakers4 (Alexander), February 20, 01:49

Pruning Makes Faster and Smaller Neural Networks | Two Minute Papers #229
The paper "Learning to Prune Filters in Convolutional Neural Networks" is available here: We would like to thank our gen...

snakers4 (Alexander), February 19, 10:25

One more article about usual suspects when your CNN fails to train



37 Reasons why your Neural Network is not working

The network had been training for the last 12 hours. It all looked good: the gradients were flowing and the loss was decreasing. But then…

snakers4 (Alexander), February 18, 08:05

Even though I am preparing a large release on GAN application on real example, I just could not help sharing these 2 links.

They are just an absolute of perfection for GANs on PyTroch



Also this is the most idiomatic PyTorch code (Imagenet finetuning) code I have ever seen


So if you are new to PyTorch, then these links will be very useful)





Contribute to WassersteinGAN development by creating an account on GitHub.

Which of the latest projects did you like the most?

Still waiting for GANs – 16

👍👍👍👍👍👍👍 48%

Satellites! – 9

👍👍👍👍 27%

Nothing / not interested / missed them – 4

👍👍 12%

Jungle! – 3

👍 9%

PM me for other options – 1

▫️ 3%

👥 33 people voted so far.

snakers4 (Alexander), February 16, 12:30

Yandex is cool - they demonstate self-driving technology of ~2012 - 15 km/h in a very controlled setting.

if you watch Andrew Ng's original course - NASA did the same with humvees in 2010-2012 on isolated tracks.


"Яндекс" протестировал беспилотное такси на московских улицах

Поездка прошла полностью в автоматическом режиме

snakers4 (Alexander), February 16, 11:14

The Real state of Deep RL



Deep Reinforcement Learning Doesn't Work Yet

This mostly cites papers from Berkeley, Google Brain, DeepMind, and OpenAI from the past few years, because that work is most visible to me. I’m almost certainly missing stuff from older literature and other institutions, and for that I apologize - I’m just one guy, after all.

snakers4 (Alexander), February 16, 11:04

New datasets

(1) HDR Dataset from Google

3,640 bursts of full-resolution raw images, made up of 28,461 individual images, along with HDR+ intermediate and final results for comparison

(2) Huge Anime dataset - 2.9m+ images annotated with 77.5m+ tags -


Introducing the HDR+ Burst Photography Dataset

Posted by Sam Hasinoff, Software Engineer, Machine Perception Burst photography is the key idea underlying the HDR+ software on Google's...

snakers4 (Alexander), February 15, 09:50

Visualizing Large-scale and High-dimensional Data:

A paper behind an awesome library


Follows the success of T-SNE, but is MUCH faster

Typical visualization pipeline

Also works awesomely with

Convergence speed

(1) (on a machine with 512GB memory, 32 cores at 2.13GHz)

(2) 3m data points * 100 dimensions, LargeVis is up 30x faster at graph construction and 7x at graph visualization





T-SNE drawbacks

(1) K-nearest neighbor graph = computational bottleneck

(2) T-SNE constructs the graph using the technique of vantage-point trees, the performance of which significantly deteriorates for high dimensions

(3) Parameters of the t-SNE are very sensitive on different data sets

Algorithm itself

(1) Create a small number of projection trees (similar to random forest). Then for each node of the graph search the neighbors of its neighbors, which are also likely to be candidates of its nearest neighbors

(2) Use SGD (or asyncronous SGD) to minize graph loss

(3) Clever sampling - sample the edges with the probability proportional to their weights and then treat the sampled edges as binary edges. Also sample some negative (not observed) edges




umap - Uniform Manifold Approximation and Projection

snakers4 (Alexander), February 15, 08:29

Pytorch 0.3.1


pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration

snakers4 (Alexander), February 15, 02:12

Announcing Tensor Comprehensions

Today, Facebook AI Research (FAIR) is announcing the release of Tensor Comprehensions, a C++ library and mathematical language that helps bridge the gap between researchers, who communicate in terms of mathematical operations, and engineers focusing on the practical needs of running large-scale models on various hardware backends. The main differentiating feature of Tensor Comprehensions is…

snakers4 (Alexander), February 14, 16:28

Google's Text Reader AI: Almost Perfect | Two Minute Papers #228
The paper "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions" is available here:

snakers4 (Alexander), February 14, 15:00

Ripple Explained
Can Ripple be considered a cryptocurrency? Is it centralized? Why is it getting adopted by so many banks so fast? Lots of questions, I'm going to answer them...

snakers4 (Alexander), February 14, 11:48

2017 DS/ML digest 4

Applied cool stuff

- How Dropbox build their OCR - via CTC loss -

Fun stuff

- CNN forward pass done in Google Sheets -

- New Boston Robotics robot - opens doors now -

- Cool but toothless list of jupyter notebooks with illustrations and models

- Best CNN filter visualization tool ever -

New directions / moonshots / papers

- IMPALA from Google - DMLab-30, a set of new tasks that span a large variety of challenges in a visually unified environment with a common action space



- Trade crypto via RL -

- SparseNets? -

- Use Apple watch data to predict diseases

- Google - Evolution in auto ML kicks in faster than RL -

- R-CNN for human pose estimation + dataset

-- Website + video

-- Paper

Google's Colaboratory gives free GPUs?

- Old GPUs

- 12 hours limit, but very cool in theory



Sick sad world

- China has police Google Glass with face recognition

- Why slack sucks -

-- Email + google docs is better for real communication


- Globally there are 22k ML developers

- One more AI chip moonshot -

- Google made their TPUs public in beta - US$6 per hour

- CNN performance comparable to human level in dermatology (R-CNN) -

- Deep learning is greedy, brittle, opaque, and shallow

- One more medical ML investment - US$25m for cancer -




snakers4 (Alexander), February 14, 10:29

Also this is related to modern OCR - how DropBox did it

snakers4 (Alexander), February 14, 04:54

Article on SpaceNet Challenge Three in Russian on habrhabr - please support us with your comments / upvotes


Also if you missed:

- The original article

- The original code release

... and Jeremy Howard from retweeted our solution, lol



But to give some idea which pain the TopCoder platform induces on the contestants, you can read

- Data Download guide

- Final testing guide

- Code release for their verification process




Из спутниковых снимков в графы (cоревнование SpaceNet Road Detector) — попадание топ-10 и код (перевод)

Привет, Хабр! Представляю вам перевод статьи. Это Вегас с предоставленной разметкой, тестовым датасетом и вероятно белые квадраты — это отложенная валидация...

snakers4 (Alexander), February 13, 08:19

Internet digest

- Ben Evans -

- FB tried to buy Snapchat 2 times - for US$60m and US$3b -

- Allegedly some ML can achieve 85% diabetes prediction accuracy on apple watch sensor data -

- Cars may embrace 48 volts instead of 12 volts -

- Google reabsorbs Nest (read between the lines - it was successful) -

- Snap +70% revenue growth -

- 7 of 8 USA top grocers participate in Instacart -

- Siri APIs are fragmented lol -

- Uber agreed to provide Waymo, the self-driving car unit under Google’s parent company, Alphabet, with 0.34 percent of its stock -



snakers4 (Alexander), February 13, 07:55

Interesting hack from n01z3 from ODS

For getting that extra 1%

Snapshot Ensembles / Multi-checkpoint TTA:


- Train CNN with LR decay until convergence, use SGD or Adam

- Use cyclic LR starting to train the network from the best checkpoint, train for several epochs

- Collect checkpoints with the best loss and use them for ensembles / TTA


Google TPUs are released in beta..US$200 per day?

No thank you! Also looks like only TF is supported so far.

Combined with rumours, sounds impractical.



Cloud TPU machine learning accelerators now available in beta

By John Barrus, Product Manager for Cloud TPUs, Google Cloud and Zak Stone, Product Manager for TensorFlow and Cloud TPUs, Google Brain Team...

snakers4 (Alexander), February 12, 04:18

Useful links about Datashader

- Home -

- Youtube presentation, practical presentations

-- OpenSky

-- 300M census data

-- NYC Taxi data

- Readme (md is broken)

- Datashader pipeline - what you need to understand to use it with examples -

Also see 2 images above)



Datashader — Datashader 0.6.5 documentation

Turns even the largest data into images, accurately.

snakers4 (Alexander), February 11, 18:28

A note on reusing my old hard-drives from mdadm raid10 array in a new raid0 array after buying more hard drives (

Ideally this command should remove the superblock from old disks

sudo mdadm --zero-superblock /dev/sdcBut in practice I faced a problem, when only after something of this sort (

dd bs=512 count=63 if=/dev/zero of=/dev/sdaraid arrays started properly on reboot. This happened to both old raid10 disks and a disk that was used as plain storage. Magic.

Ofc you can shred and fill the whole disk with zeros, but it takes a lot of time...

sudo shred -v -n1 -z /dev/sda


How To Create RAID Arrays with mdadm on Ubuntu 16.04 | DigitalOcean

Linux's madam utility can be used to turn a group of underlying storage devices into different types of RAID arrays. This provides various advantages depending on which RAID level is used. This guide will cover how to set up devices in the most common

snakers4 (Alexander), February 11, 07:02

In a nutshell - Datashader is AWESOME

Instead of this

snakers4 (Alexander), February 11, 06:46

Datashader Revealing the Structure of Genuinely Big Data | SciPy 2016 | James A Bednar
Current plotting tools are inadequate for revealing the distributions of large, complex datasets, both because of technical limitations and because the resul...

snakers4 (Alexander), February 10, 17:31

So, I accidentally was able to talk to the Vice President of GameWorks in Nvidia in person =)

All of this should be taken with a grain of salt. I am not endorsing Nvidia.

- In the public part of the speech he spoke about public Nvidia research projects - most notable / fresh was Nvidia Holodeck - their VR environment

- Key insight - even despite the fact that Rockstar forbid to use GTA images for deep learning, he believes that artificial images used for annotation will be the future of ML because game engines and OS are the most complicated software ever

Obviously, I asked interesting question afterwards =) Most notably about about GPU market and forces

- GameWorks = 200 people doing AR / VR / CNN research

- The biggest team in Nvidia is 2000 - drivers

- Ofc he refused to reply when new generation GPUs will be released and whether the rumour about their current generation GPUs being not produced anymore is true

- He says they are mostly software company focusing on drivers

- Each generation cycle takes 3 years, Nvidia has only one architecture per generation, all the CUDA / ML stuff was planned in 2012-2014

- A rumour about Google TPU. Google has an internal quota - allegedly (!) they cannot buy more GPUs than TPUs, but TPUs are 1% utilized and allegedly they lure Nvidia people to optimize their GPUs to make sure they use this quota efficently

- AMD R&D spend on both CPU and GPU is less than Nvidia spend on GPU

- He says that newest AMD have more 30-40% FLOPs, but they are compared against previous generation consumer GT cards on synthetic tests. AMD does not have a 2000 people driver team...

- He says that Intel has 3-5 new architectures in the works - which may a problem


snakers4 (Alexander), February 10, 13:55

Forwarded from Linuxgram:

The 5 Coolest Things About VLC 3.0

VLC Chromecast support arrives in VLC 3.0, as do many other features! In this post we take a look at 5 changes that make this VLC release worth downloading.

snakers4 (Alexander), February 10, 10:47

Some idiomatic pandas for loading several dataframes at once quickly

import pandas as pd

def date_to_months(df,date_col,new_col):

df[date_col] = pd.to_datetime(df[date_col])

df[new_col] = df[date_col].apply(lambda x: str(x.year) + '_' + str(x.month).zfill(2))

return df

def clean_hdfs_artifacts(df):

df = df[df[df.columns[0]] != df.columns[0]]

return df

files = ['../data/photo_like_profile_2017_final.csv','../data/photo_like_profile_2018_final.csv']

likes_pr = [(pd.read_csv(fp)



.pipe(date_to_months, 'photo_like_timestamp','like_month')

) for fp in files]

likes_pr = pd.concat(likes_pr)

snakers4 (Alexander), February 10, 09:02

Took a first stab on playing with XGB on GPU and updated my Dockerfile


May not work, but below links / snippet may help




# complile with GPU support

git clone --recursive &&

cd xgboost &&

mkdir build &&

cd build &&

cmake .. -DUSE_CUDA=ON &&

make -j &&

cd ../ &&

# install python package

cd python-package &&

python3 install &&

cd ../

# test all is ok


python3 tests/benchmark/

Also LightGBM depends on old drivers, and does not work (yet) with nvidia-390 on Ubuntu (yet).


Dockerfile update

snakers4 (Alexander), February 10, 07:31

(!) For new people on the channel:

- This channel is a practitioner's channel on the following topics: Internet, Data Science, math, deep learning, philosophy

- Focus is on data science

and deep learning

- Don't get your opinion in a twist if your opinion differs. You are welcome to contact me via telegram @snakers41 and email -

- No bs and ads

- Every week or two (or three) I review some materials and do ML / Internet digests (I used to to digests of digests, but not I have no time to do that)

Give us a rating:



- Buy me a coffee

- Direct donations - - 5011673505 (paste this agreement number)

- Yandex -

Key / major links:

Our website


Our chat


DS courses review

(if you are a beginner)



GAN papers review



- (RU) habr

- article

- code


- article

- code

- code for TopCoder

Telegram Channels Bot

Discover the best channels 📢 available on Telegram. Explore charts, rate ⭐️ and enjoy updates!

snakers4 (Alexander), February 10, 07:13

So we started publishing articles / code / solutions to the recent SpaceNet3 challenge. A Russian article on will also be published soon.

- The original article

- The original code release

... and Jeremy Howard from retweeted our solution, lol



But to give some idea which pain the TopCoder platform induces on the contestants, you can read

- Data Download guide

- Final testing guide

- Code release for their verification process




How we participated in SpaceNet three Road Detector challenge

This article tells about our SpaceNet Challenge participation, semantic segmentation in general and transforming masks into graphs Статьи автора - Блог -

snakers4 (Alexander), February 08, 09:13 lesson 11 notes:

- Links

-- Video


- Semantic embeddings + imagenet can be powerful, but not deployable per se

- Training nets on smaller images usually works

- Comparing activation functions

- lr annealing

- linear learnable colour swap trick

- adding Batchnorm

- replacing max-pooling with avg_pooling

- lr vs batch-size

- dealing with noisy labels

- FC / max-pooling layer models are better for transfer-learning?

- size vs. flops vs. speed

- cyclical learning rate paper

- Some nice intuitions about mean shift clustering





Lesson 11: Cutting Edge Deep Learning for Coders
We’ve covered a lot of different architectures, training algorithms, and all kinds of other CNN tricks during this course—so you might be wondering: what sho...

Meta research on the CNNs

(also this amazing post

An Analysis of Deep Neural Network Models for Practical Applications

Key findings:

(1) power consumption is independent of batch size and architecture;

(2) accuracy and inference time are in a hyperbolic relationship;

(3) energy constraint = upper bound on the maximum achievable accuracy and model complexity;

(4) the number of operations is a reliable estimate of the inference time


- Accuracy and param number -

- Param efficiency -

Also a summary of architectural patterns



Deep Learning Scaling is Predictable, Empirically




- various empirical learning curves show robust power-law region

- new architectures slightly shift learning curves downwards

- model architecture exploration should be feasible with small training data sets

- it can be difficult to ensure that training data is large enough to see the power-law learning curve region

- irreducible error region

- each new hardware generation with improved FLOP rate can pro- vide a predictable step function improvement in relative DL model accuracy



Neural Network Architectures

Deep neural networks and Deep Learning are powerful and popular algorithms. And a lot of their success lays in the careful design of the…

snakers4 (Alexander), February 08, 05:20

Looks useless...but so cool!

Maybe in 1-2 years Reinforcement Learning will become a thing



IMPALA - a new and efficient distributed architecture capable of solving many tasks at the same time in DeepMind Lab. - blog - paper - the new DMLab-30 environments @GitHub

snakers4 (Alexander), February 07, 18:46

DeepMind Control Suite | Two Minute Papers #226
The paper "DeepMind Control Suite" and its source code is available here: We wo...

snakers4 (Alexander), February 07, 14:09

Following our blog post, we also posted a Russian translation of the Jungle competition to habrhabr




Соревнование Pri-matrix Factorization на DrivenData с 1ТБ данных — как мы заняли 3 место (перевод)

Привет, Хабр! Представляю вашему вниманию перевод статьи "Animal detection in the jungle — 1TB+ of data, 90%+ accuracy and 3rd place in the competition". Или...