Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1356 members, 1614 posts since 2016

All this - lost like tears in rain.

Data science, deep learning, sometimes a bit of philosophy and math. No bs.

Our website
Our chat
DS courses review

snakers4 (Alexander), September 14, 05:32

Understanding the current SOTA NMT / NLP model - transformer

A list of articles that really help to do so:

- Understanding attention

- Annotated transformer

- Illustrated transformer

Playing with transformer in practice

This repo turned out to be really helpful

It features:

- Decent well encapsulated model and loss;

- Several head for different tasks;

- It works;

- Ofc their data-loading scheme is crappy and over-engineered;

My impressions on actually training the transformer model for classification:

- It works;

- It is high capacity;

- Inference time is ~`5x` higher than char-level or plain RNNs;

- It serves as a classifier as well as an LM;

- Capacity is enough to tackle most challenging tasks;

- It can be deployed on CPU for small texts (!);

- On smaller tasks there is no clear difference between plain RNNs and Transformer;


Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)

May 25th update: New graphics (RNN animation, word embedding graph), color coding, elaborated on the final attention example. Note: The animations below are videos. Touch or hover on them (if you’re using a mouse) to get play controls so you can pause if needed. Sequence-to-sequence models are deep learning models that have achieved a lot of success in tasks like machine translation, text summarization, and image captioning. Google Translate started using such a model in production in late 2016. These models are explained in the two pioneering papers (Sutskever et al., 2014, Cho et al., 2014). I found, however, that understanding the model well enough to implement it requires unraveling a series of concepts that build on top of each other. I thought that a bunch of these ideas would be more accessible if expressed visually. That’s what I aim to do in this post. You’ll need some previous understanding of deep learning to get through this post. I hope it can be a useful companion to reading the papers mentioned…

snakers4 (Alexander), September 11, 18:12

Gensim's fast-text subwords

Some monkey patching to get subwords from Gensim's fast-text

from gensim.models.utils_any2vec import _compute_ngrams,_ft_hash

def subword(self, word):

ngram_lst = []

ngrams = _compute_ngrams(word, self.min_n, self.max_n)

for ngram in ngrams:

ngram_hash = _ft_hash(ngram) % self.bucket

if ngram_hash in self.hash2index:


return ngram_lst

gensim.models.keyedvectors.FastTextKeyedVectors.subword = subword

snakers4 (Alexander), September 11, 06:00

Useful Python / PyTorch bits

dot.notation access to dictionary attributes

class dotdict(dict):

__getattr__ = dict.get

__setattr__ = dict.__setitem__

__delattr__ = dict.__delitem__

PyTorch embedding layer - ignore padding

nn.Embedding has a padding_idx attribute not to update the padding token embedding.



snakers4 (Alexander), September 06, 16:20

This AI Performs Super Resolution in Less Than a Second
The paper "A Fully Progressive Approach to Single-Image Super-Resolution" is available here: A-Man's Caustic scene: http:/...

snakers4 (Alexander), September 06, 06:18



- A 2017 Imagenet winner;

- Mostly ResNet-152 inspired network;

- Transfers well (ResNet);

- Squeeze and Excitation (SE) block, that adaptively recalibratess channel-wise feature responses by explicitly modelling in- terdependencies between channels;

- Intuitively looks like - convolution meet the attention mechanism;

- SE block:


- Reduction ratio r to be 16 in all experiments;

- Results:



snakers4 (Alexander), September 06, 05:57

Chainer - a predecessor of PyTorch

Looks like

- PyTorch was based not only on Torch, but also its autograd was forked from Chainer;

- Chainer looks like PyTorch ... but not by Facebook, but by independent Japanese group;

- A quick glance through the docs confirms that PyTorch and Chainer APIs look 90% identical (both numpy inspired, but using different back-ends);

- Open Images 2nd place was taken by people using Chainer with 512 GPUs;

- I have yet to confirm myself that PyTorch can work with a cluster (but other people have done it);



ConvNet training using pytorch. Contribute to eladhoffer/convNet.pytorch development by creating an account on GitHub.

Also - thanks for all DO referral link supporters - now finally hosting of my website is free (at least for next ~6 months)!

Also today I published a 200th post on Ofc not all of these are proper long articles, but nevertheless it's cool.

snakers4 (Alexander), September 06, 05:48

DS/ML digest 23

The key topic of this one - is this is insanity

- vid2vid

- unsupervised NMT

If you like our digests, you can support the channel via:

- Sharing / reposting;

- Giving an article a decent comment / a thumbs-up;

- Buying me a coffee (links on the digest);

Let's spread the right DS/ML ideas together.




2018 DS/ML digest 23

2018 DS/ML digest 23 Статьи автора - Блог -

snakers4 (Alexander), September 05, 06:40

MySQL - replacing window functions

Older versions of MySQL (and maybe newer ones) do not have all the goodness you can find in PostgreSQL. Ofc you can do plain session matching in Python, but sometimes you just need to do it in plain SQL.

In Postgres you usually use window functions for this purpose if you need PLAIN SQL (ofc there are stored procedures / views / mat views etc).

In MySQL it can be elegantly solved like this:

SET @session_number = 0, @last_uid = '0', @current_id = '0', @dif=0;





@last_uid:[email protected]_uid,


@dif:=TIMESTAMPDIFF(MINUTE, t2.session_ts, t1.session_ts),

if(@[email protected]_uid, if(@dif > 30,@session_number:[email protected]_number+1,@session_number),@session_number:=0) as session


table1 t1

JOIN table2 t2 on =


snakers4 (Alexander), September 03, 06:27

Training a MNASNET from scratch ... and failing

As a small side hobby we tried training new Google's mobile network from scratch and failed:



Maybe you know how to train it properly?

Also now you can upvote articles on spark in me! =)


Training your own MNASNET

Training your own MNASNET Статьи автора - Блог -

snakers4 (Alexander), September 02, 19:49

Everybody Dance Now! - AI-Based Motion Transfer
Pick up cool perks on our Patreon page: The paper "Everybody Dance Now" is available here:

snakers4 (Alexander), September 02, 06:22

A small hack to spare PyTorch memory when resuming training

When you resume from a checkpoint, consider adding this to save GPU memory:

del checkpoint



snakers4 (Alexander), August 31, 13:59

DS/ML digest 22




2018 DS/ML digest 22

2018 DS/ML digest 22 Статьи автора - Блог -

snakers4 (Alexander), August 31, 13:38

ADAMW to be integrated into upstream PyTorch?


Fixing Weight Decay Regularization in Adam by jingweiz · Pull Request #3740 · pytorch/pytorch

Hey, We added SGDW and AdamW in optim, accoridng to the new ICLR submission from Loshchilov and Hutter: Fixing Weight Decay Regularization in Adam. We also found some inconsistency of the current i...

snakers4 (Alexander), August 29, 08:16

Crowd-AI maps repo

Just opened my repo for crowd AI maps 2018.

Did not pursue this competition till the end, so it is not polished, .md is not updated. Use it at your own risk!



CrowdAI mapping challenge 2018 solution. Contribute to snakers4/crowdai-maps-2018 development by creating an account on GitHub.

snakers4 (Alexander), August 29, 05:45

Venn diagrams in python

Compare sets as easy as:

# Import the library

import matplotlib.pyplot as plt

from matplotlib_venn import venn3

# Make the diagram


venn3(subsets = (s1,s2,s3),set_labels=['synonyms','our_tree','add_syns'])

Very simple and useful.


snakers4 (Alexander), August 28, 14:42

Garbage collection and encapsulation in python

Usually this is not an issue.

But when doing a batch-job / loop / something with 100k+ objects it suddenly becomes an issue.

Also it does not help that the environment you are testing your code and the actual running environments are different.

The solution - simply encapsulate everything you can. Garbage is automatically collected.


A case against Kaggle

If you thought that Kaggle is the home of Data Science - think again.

This is official - they do not know the hell they are doing.

There have been several appalling cases already, but this takes the prize.

Following this thread, wrote a small petition to Kaggle

I doubt that they will hear, but why not.


snakers4 (Alexander), August 26, 08:16

A small bug in PyTorch to numpy conversions

Well, maybe a feature =)

When you do something like this

a.permute(0, 2, 3, 1).numpy()


a.view((same shape here)).numpy()

you may expect behaviour similar to np.reshape.

But no. Looks like now all of these functions are implemented in C and they produce artifacts. To avoid artifacts call:

a.permute(0, 2, 3, 1).contiguous().numpy()

a.view((same shape here)).contiguous().numpy()

It is also iteresting, because when you apply some layers in PyTorch error is raised when you do not use .contiguous().


snakers4 (Alexander), August 24, 07:04

Played with FAISS

Played with FAISS on the GPU a bit. Their docs also cover simplistic cases really well, but more sophisticated cases are better inferred from examples, because their Python docs are not really intended for heavy use.

Anyway I manged to build a KNN graph with FAISS on on GPU for 10m points in 2-3 hours.

It does the following:

- KNN graph;

- PCA, K-Means;

- Queries;

- VERY sophisticated indexing with many option;

It supports:

- GPU;

- Multi-GPU (I had to use env. variables to limit GPU list, because there is no option for python);

Also their docs are awsome for such a low-level project.,-PCA,-quantization




A library for efficient similarity search and clustering of dense vectors. - facebookresearch/faiss

Playing with Atom and Hydrogen

TLDR - it delivers to turn the Atom editor into something like interactive notebook, but its auto-completion options are very scarce. Also you can connect to your running ipython kernel as well as docker container.

But it is no match to either a normal notebook or a python IDE.


snakers4 (Alexander), August 22, 16:35

NVIDIA's Image Restoration AI: Almost Perfect
The paper "Noise2Noise: Learning Image Restoration without Clean Data" and its source code are available here: 1. 2. https:/...

snakers4 (Alexander), August 22, 09:24

This is madness

Probably impossible to replicate

Also code is available for PyTorch =)

Forwarded from Just links:

Vid2Vid from NVIDIA and MIT



Video-to-Video Synthesis
Example results for the "Video-to-Video Synthesis" paper. Please find the code at our website:

snakers4 (Alexander), August 21, 13:31

2018 DS/ML digest 21




2018 DS/ML digest 21

2018 DS/ML digest 21 Статьи автора - Блог -

snakers4 (Alexander), August 20, 10:57

PyTorch - a brief state of sparse operations

TLDR - they are not there yet


If you would like to see them implemented faster, write here



The state of sparse Tensors #9674

This note tries to summarize the current state of sparse tensor in pytorch. It describes important invariance and properties of sparse tensor, and various things need to be fixed (e.g. empty sparse tensor). It also shows some details of ...

snakers4 (Alexander), August 19, 13:20

Nice down-to-earth post about Titan V vs 1080 Ti

Also it seems that new generation nvidia GPUs will have 12 GB of RAM tops.

1080Ti is the best option for CNNs now.


Titan V vs 1080 Ti — Head-to-head battle of the best desktop GPUs on CNNs. Is Titan V worth it? 110 TFLOPS! no brainer, right?

NVIDIA’s Titan V is the latest “desktop” GPU built upon the Volta architecture boasting 110 “deep learning” TFLOPS in the spec sheet. That…

snakers4 (Alexander), August 17, 11:45

Found all Ipython's rich display capabilities in one place

Notebook on nbviewer

Check out this Jupyter notebook!

snakers4 (Alexander), August 16, 06:00

Google updates its transformer


Moving Beyond Translation with the Universal Transformer

Posted by Stephan Gouws, Research Scientist, Google Brain Team and Mostafa Dehghani, University of Amsterdam PhD student and Google Research...

snakers4 (Alexander), August 15, 02:18

NVIDIA's AI Makes Amazing Slow-Mo Videos
The paper "Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation" is available here:

snakers4 (Alexander), August 13, 11:04

Yet another crowd GPU rent service?

Create Instance | Console

Search available instances, configure launch settings, create instances

snakers4 (Alexander), August 13, 05:24

Float16 / half training in PyTorch

Tried to do it in the most obvious way + some hacks from here

Did anybody do it successfully on real models?


Resnet18 throws exception on conversion to half floats

Hey, I have tried to launch the following code: from torchvision import models resnet = models.resnet18(pretrained=True).cpu() resnet.half() and have got an exception: libc++abi.dylib: terminating with uncaught exception of type std::invalid_argument: Unsupported tensor type Sounds like the halftensor type is not registered properly. But not sure why it’s the case. Using pytorch 0.2.0 py36_1cu75 soumith Any advice how to fix it?

snakers4 (Alexander), August 13, 04:34

Untar all the archives in the folder, deleting them

find . -name '*.tar' -execdir tar -xvf '{}' ; -execdir rm '{}' ;


snakers4 (Alexander), August 12, 11:15

2018 DS/ML digest 20




2018 DS/ML digest 20

2018 DS/ML digest 20 Статьи автора - Блог -