Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1818 members, 1744 posts since 2016

All this - lost like tears in rain.

Data science, ML, a bit of philosophy and math. No bs.

Our website
Our chat
DS courses review

snakers4 (Alexander), March 17, 15:40

New large dataset for you GAN or pix2pix pet project

500k fashion images + meta-data + landmarks



DeepFashion2 Dataset - switchablenorms/DeepFashion2

snakers4 (Alexander), March 17, 05:41

Cramer's rule, explained geometrically | Essence of linear algebra, chapter 12
This rule seems random to many students, but it has a beautiful reason for being true. Home page: Brought to you by you: http://...

New video from 3B1B

Which is kind of relevant

snakers4 (Alexander), March 14, 03:58

GANPaint: An Extraordinary Image Editor AI
📝 The paper " GAN Dissection: Visualizing and Understanding Generative Adversarial Networks " and its web demo is available here: https://gandissect.csail.mi...

snakers4 (Alexander), March 12, 15:45

Our Transformer post was featured by Towards Data Science


Comparing complex NLP models for complex languages on a set of real tasks

Transformer is not yet really usable in practice for languages with rich morphology, but we take the first step in this direction

snakers4 (Alexander), March 12, 11:53

New tricks for training CNNs

Forwarded from Just links:

Bag of Tricks for Image Classification with Convolutional Neural Networks

Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods. In the...

Forwarded from Just links:

DropBlock: A regularization method for convolutional networks

snakers4 (Alexander), March 08, 14:38

Forwarded from Just links:

Calling Bullshit: Data Reasoning in a Digital World

The world is awash in bullshit. Politicians are unconstrained by facts. Science is conducted by press release. Higher education rewards bullshit over analytic thought. Startup culture elevates bullshit to high art. Advertisers wink conspiratorially and invite us to join them in seeing through all the bullshit — and take advantage of our lowered guard to bombard us with bullshit of the second order. The majority of administrative activity, whether in private business or the public sphere, seems to be little more than a sophisticated exercise in the combinatorial reassembly of bullshit.

snakers4 (Alexander), March 08, 12:56

snakers4 (Alexander), March 07, 15:42

Our experiments with Transformers, BERT and generative language pre-training


For morphologically rich languages pre-trained Transformers are not a silver bullet and from a layman's perspective they are not feasible unless someone invests huge computational resources into sub-word tokenization methods that work well + actually training these large networks.

On the other hand we have definitively shown that:

- Starting a transformer with Embedding bag initialized via FastText works and is relatively feasible;

- On complicated tasks - such transformer significantly outperforms training from scratch (as well as naive models) and shows decent results compared to state-of-the-art specialized models;

- Pre-training worked, but it overfitted more thatn FastText initialization and given the complexity required for such pre-training - it is not useful;

All in all this was a relatively large gamble, which did not pay off - on some more down-to-earth task we hoped the Transformer would excel at - it did not.


Complexity / generalization /computational cost in modern applied NLP for morphologically rich languages

Complexity / generalization /computational cost in modern applied NLP for morphologically rich languages. Towards a new state of the art? Статьи автора - Блог -

An approach to ranking search results with no annotation

Just a small article with a novel idea:

- Instead of training a network with CE - just train it with BCE;

- Source additional structure from the inner structure of your domain (tags, matrix decomposition methods, heuristics, etc);

Works best if your ontology is relatively simple.


Learning to rank search results without annotation

Solving search ranking problem Статьи автора - Блог -

snakers4 (Alexander), March 07, 11:21

Inception v1 layers visualized on a map

A joint work by Google and OpenAI:


- Take 1M random images;

- Feed to a CNN, collect some spatial activation;

- Produce a corresponding idealized image that would result in such an activation;

- Plot in 2D (via UMAP), add grid, averaging, etc etc;


Activation Atlas

By using feature inversion to visualize millions of activations from an image classification network, we create an explorable activation atlas of features the network has learned and what concepts it typically represents.

snakers4 (Alexander), March 07, 09:58

Russian STT datasets

Anyone knows more proper datasets?

I found this (60 hours), but I could not find the link to the dataset:

Anyway, here is the list I found:

- 20 hours of Bible;

- - does not say how many hours

- Ofc audio book datasets - + and some scraping scripts

- And some disappointment here


Download 274_Paper.pdf 0.31 MB

snakers4 (Alexander), March 07, 06:47

PyTorch internals


PyTorch under the hood

Presentation about PyTorch internals presented at the PyData Montreal in Feb 2019.

snakers4 (Alexander), March 06, 10:31

5th 2019 DS / ML digest

Highlights of the week

- New Adam version;

- POS tagging and semantic parsing in Russian;

- ML industrialization again;




2019 DS/ML digest 05

2019 DS/ML digest 05 Статьи автора - Блог -

snakers4 (Alexander), March 05, 09:23

Anyone knows anyone from TopCoder?

As usual with competition platforms organization sometimes has its issues

Forwarded from Анна:


Если кто не знает, кроме призовых за топ места, в спутниках была ещё одна классная фича - student's prize - приз для _студента_ с самым высоким скором. Там всё оказалось довольно неочевидно, отдельного лидерборда для студентов не было. Долго пыталась достучаться до админов, писала на почту, на форум, чтобы узнать больше подробностей. Спустя месяц админ таки ответил, что я единственный претендент на приз и, вроде, никаких проблем, всё улаживаем, кидай студак. И снова пропал. Периодически напоминала о своем существовании, интересовалась, как там дела, есть ли подвижки, в ответ игнор. *Ответа нет до сих пор.* Я впервые участвую в серьезном сореве и не совсем понимаю, что можно сделать в такой ситуации. Ждать новостей? Писать посты в твитер? Есть ли какой-то способ достучаться до админов?

Олсо, написала тут небольшую статейку про свое решение.

How I got to Top 10 in Spacenet 4 Challenge

Spacenet 4 Challenge: Building Footprints Статьи автора - Блог -

snakers4 (Alexander), March 04, 08:46

Tracking your hardware ... for data science

For a long time I though that if you really want to track all your servers' metrics you need Zabbix (which is very complicated).

A friend recommended me an amazing tool


It installs and runs literally in minutes.

If you want to auto-start it properly, there are even a bit older Ubuntu packages and systemd examples


Dockerized metric exporters for GPUs by Nvidia


It also features extensive alerting features, but they are very difficult to easily start, there being no minimal example




Monitoring Linux host metrics with the Node Exporter | Prometheus

An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.

snakers4 (Alexander), March 02, 04:49

Can You Recover Sound From Images?
Is it possible to reconstruct sound from high-speed video images? Part of this video was sponsored by LastPass: Special thanks to Dr. A...

snakers4 (Alexander), February 28, 07:16

LSTM vs TCN vs Trellis network

- Did not try the Trellis network - decided it was too complex;

- All the TCN properties from the digest hold - did not test for very long sequences;

- Looks like a really simple and reasonable alternative for RNNs for modeling and ensembling;

- On a sensible benchmark - performes mostly the same as LSTM from a practical standpoint;


2018 DS/ML digest 31

2018 DS/ML digest 31 Статьи автора - Блог -

snakers4 (Alexander), February 27, 13:02

Dependency parsing and POS tagging in Russian

Less popular set of NLP tasks.

Popular tools reviewed

Only morphology:

(0) Well known pymorphy2 package;

Only POS tags and morphology:

(0) (easy to use);

(1) (easy to use);

Full dependency parsing

(0) Russian spacy plugin:

- - installation

- - usage with examples

(1) Malt parser based solution (drawback - no examples)


(2) Google's syntaxnet



Изучаем синтаксические парсеры для русского языка

Привет! Меня зовут Денис Кирьянов, я работаю в Сбербанке и занимаюсь проблемами обработки естественного языка (NLP). Однажды нам понадобилось выбрать синтаксичес...

snakers4 (Alexander), February 27, 12:39

We tried it

... yeah we tried it on a real task

just adam is a bit better

snakers4 (Alexander), February 27, 07:50

New variation of Adam?

- [Website](;

- [Code](;

- Eliminate the generalization gap between adaptive methods and SGD;

- TL;DR: A Faster And Better Optimizer with Highly Robust Performance;

- Dynamic bound on learning rates. Inspired by gradient clipping;

- Not very sensitive to the hyperparameters, especially compared with Sgd(M);

- Tested on MNIST, CIFAR, Penn Treebank - no serious datasets;


Adaptive Gradient Methods with Dynamic Bound of Learning Rate

Abstract Adaptive optimization methods such as AdaGrad, RMSProp and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared with Sgd or even fail to converge due to unstable and extreme learning rates. Recent work has put forward some algorithms such as AMSGrad to tackle this issue but they failed to achieve considerable improvement over existing methods.

snakers4 (Alexander), February 18, 09:24

4th 2019 DS / ML digest

Highlights of the week

- OpenAI controversy;

- BERT pre-training;

- Using transformer for conversational challenges;




2019 DS/ML digest 04

2019 DS/ML digest 04 Статьи автора - Блог -

snakers4 (Alexander), February 17, 10:22

A bit of lazy Sunday admin stuff

Monitoring you CPU temperature with email notifications

- Change CPU temp to any metric you like

- Rolling log

- Sending email only one time, if the metric becomes critical (you can add an email when metric becomes non-critical again)

Setting up a GPU box on Ubuntu 18.04 from scratch



Plain temperature monitoring in Ubuntu 18.04

Plain temperature monitoring in Ubuntu 18.04. GitHub Gist: instantly share code, notes, and snippets.

snakers4 (Alexander), February 17, 08:49

Pinned post

What is this channel about?


This channel is a practitioner's channel on the following topics: Internet, Data Science, Deep Learning, Python, NLP


Don't get your opinion in a twist if your opinion differs.

You are welcome to contact me via telegram @snakers41 and email - [email protected]


No BS and ads - I already rejected 3-4 crappy ad deals


DS ML digests - in the RSS or via URLs like this



Buy me a coffee 🤟

Give us a rating:


Our chat


More links


Our website


Our chat


DS courses review (RU) - very old


2017 - 2018 SpaceNet Challenge


DS Bowl 2018


Data Science tag on the website

(7) project


CFT 2018 competition


2018 retrospective

More amazing NLP-related articles incoming!

Maybe finally we will make podcasts?

2019 DS/ML digest 01

2019 DS/ML digest 01 Статьи автора - Блог -

snakers4 (Alexander), February 14, 06:20

Whict type of content do you / would you like most on the channel?

  • Weekly / bi-weekly digests; (34)
  • Full articles; (13)
  • Podcasts with actual ML practicioners; (12)
  • Practical bits on real applied NLP; (28)
  • Pre-trained BERT with Embedding Bags for Russian; (11)
  • Paper reviews; (21)
  • Jokes / memes / cats; (9)

128 votes

snakers4 (Alexander), February 13, 09:56


(2) is valid for models with complex forward pass and models with large embedding layers

snakers4 (Alexander), February 13, 09:02

PyTorch NLP best practices

Very simple ideas, actually.

(1) Multi GPU parallelization and FP16 training

Do not bother reinventing the wheel.

Just use nvidia's apex, DistributedDataParallel, DataParallel.

Best examples [here](

(2) Put as much as possible INSIDE of the model

Implement the as much as possible of your logic inside of nn.module.


So that you can seamleassly you all the abstractions from (1) with ease.

Also models are more abstract and reusable in general.

(3) Why have a separate train/val loop?

PyTorch 0.4 introduced context handlers.

You can simplify your train / val / test loops, and merge them into one simple function.

context = torch.no_grad() if loop_type=='Val' else torch.enable_grad()

if loop_type=='Train':
elif loop_type=='Val':

with context:
for i, (some_tensor) in enumerate(tqdm(train_loader)):
# do your stuff here

(4) EmbeddingBag

Use EmbeddingBag layer for morphologically rich languages. Seriously!

(5) Writing trainers / training abstractions

This is waste of time imho if you follow (1), (2) and (3).

(6) Nice bonus

If you follow most of these, you can train on as many GPUs and machines as you wan for any language)

(7) Using tensorboard for logging

This goes without saying.




📖The Big-&-Extending-Repository-of-Transformers: Pretrained PyTorch models for Google's BERT, OpenAI GPT & GPT-2, Google/CMU Transformer-XL. - huggingface/pytorch-pretrained-BERT

PyTorch DataLoader, GIL thrashing and CNNs

Well all of this seems a bit like magic to me, but hear me out.

I abused my GPU box for weeks running CNNs on 2-4 GPUs.

Nothing broke.

And then my GPU box started shutting down for no apparent reason.

No, this was not:

- CPU overheating (I have a massive cooler, I checked - it works);

- PSU;

- Overclocking;

- It also adds to confusion that AMD has weird temperature readings;

To cut the story short - if you have a very fast Dataset class and you use PyTorch's DataLoader with workers > 0 it can lead to system instability instead of speeding up.

It is obvious in retrospect, but it is not when you face this issue.



snakers4 (Alexander), February 12, 05:13

Russian thesaurus that really works

It knows so many peculiar / old-fashioned and cheeky synonyms for obscene words!


Russian Distributional Thesaurus

Russian Distributional Thesaurus (сокр. RDT) — проект создания открытого дистрибутивного тезауруса русского языка. На данный момент ресурс содержит несколько компонент: вектора слов (word embeddings), граф подобия слов (дистрибутивный тезаурус), множество гиперонимов и инвентарь смыслов слов. Все ресурсы были построены автоматически на основании корпуса текстов книг на русском языке (12.9 млрд словоупотреблений). В следующих версиях ресурса планируется добавление и векторов смыслов слов для русского языка, которые были получены на основании того же корпуса текстов. Проект разрабатывается усилиями представителей УрФУ, МГУ им. Ломоносова, Университета Гамбурга. В прошлом в проект внесли свой вклад исследователи из Южно-Уральского государственного университета, Дармштадского технического университета, Волверхемтонского университета и Университета Тренто.

snakers4 (Alexander), February 11, 06:29

Forwarded from Sava Kalbachou:

These are the Easiest Data Augmentation Techniques in Natural Language Processing you can think of — and they work.

Data augmentation is commonly used in computer vision. In vision, you can almost certainly flip, rotate, or mirror an image without risk…

snakers4 (Alexander), February 11, 06:22

Old news ... but Attention works

Funny enough, but in the past my models :

- Either did not need attention;

- Attention was implemented by @thinline72 ;

- The domain was so complicated (NMT) so that I had to resort to boilerplate with key-value attention;

It was the first time I / we tried manually building a model with plain self attention from scratch.

An you know - it really adds 5-10% to all of the tracked metrics.

Best plain attention layer in PyTorch - simple, well documented ... and it works in real life applications:



SelfAttention implementation in PyTorch

SelfAttention implementation in PyTorch. GitHub Gist: instantly share code, notes, and snippets.

snakers4 (Alexander), February 08, 16:20

DeepMind’s AlphaStar Beats Humans 10-0 (or 1)
DeepMind's #AlphaStar blog post: Full event:

snakers4 (Alexander), February 08, 10:11

Third 2019 DS / ML digest

Highlights of the week

- quaternions;

- ODEs;




2019 DS/ML digest 03

2019 DS/ML digest 03 Статьи автора - Блог -

older first