Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1317 members, 1587 posts since 2016

All this - lost like tears in rain.

Data science, deep learning, sometimes a bit of philosophy and math. No bs.

Our website
Our chat
DS courses review

Posts by tag «deep_learning»:

snakers4 (Alexander), September 20, 16:06

DS/ML digest 24

Key topics of this one:

- New method to calculate phrase/n-gram/sentence embeddings for rare and OOV words;

- So many releases from Google;

If you like our digests, you can support the channel via:

- Sharing / reposting;

- Giving an article a decent comment / a thumbs-up;

- Buying me a coffee (links on the digest);




2018 DS/ML digest 24

2018 DS/ML digest 24 Статьи автора - Блог -

snakers4 (Alexander), September 06, 06:18



- A 2017 Imagenet winner;

- Mostly ResNet-152 inspired network;

- Transfers well (ResNet);

- Squeeze and Excitation (SE) block, that adaptively recalibratess channel-wise feature responses by explicitly modelling in- terdependencies between channels;

- Intuitively looks like - convolution meet the attention mechanism;

- SE block:


- Reduction ratio r to be 16 in all experiments;

- Results:



snakers4 (Alexander), September 06, 05:57

Chainer - a predecessor of PyTorch

Looks like

- PyTorch was based not only on Torch, but also its autograd was forked from Chainer;

- Chainer looks like PyTorch ... but not by Facebook, but by independent Japanese group;

- A quick glance through the docs confirms that PyTorch and Chainer APIs look 90% identical (both numpy inspired, but using different back-ends);

- Open Images 2nd place was taken by people using Chainer with 512 GPUs;

- I have yet to confirm myself that PyTorch can work with a cluster (but other people have done it);



ConvNet training using pytorch. Contribute to eladhoffer/convNet.pytorch development by creating an account on GitHub.

Also - thanks for all DO referral link supporters - now finally hosting of my website is free (at least for next ~6 months)!

Also today I published a 200th post on Ofc not all of these are proper long articles, but nevertheless it's cool.

snakers4 (Alexander), September 06, 05:48

DS/ML digest 23

The key topic of this one - is this is insanity

- vid2vid

- unsupervised NMT

If you like our digests, you can support the channel via:

- Sharing / reposting;

- Giving an article a decent comment / a thumbs-up;

- Buying me a coffee (links on the digest);

Let's spread the right DS/ML ideas together.




2018 DS/ML digest 23

2018 DS/ML digest 23 Статьи автора - Блог -

snakers4 (Alexander), September 03, 06:27

Training a MNASNET from scratch ... and failing

As a small side hobby we tried training new Google's mobile network from scratch and failed:



Maybe you know how to train it properly?

Also now you can upvote articles on spark in me! =)


Training your own MNASNET

Training your own MNASNET Статьи автора - Блог -

snakers4 (Alexander), September 02, 06:22

A small hack to spare PyTorch memory when resuming training

When you resume from a checkpoint, consider adding this to save GPU memory:

del checkpoint



snakers4 (Alexander), August 31, 13:59

DS/ML digest 22




2018 DS/ML digest 22

2018 DS/ML digest 22 Статьи автора - Блог -

snakers4 (Alexander), August 31, 13:38

ADAMW to be integrated into upstream PyTorch?


Fixing Weight Decay Regularization in Adam by jingweiz · Pull Request #3740 · pytorch/pytorch

Hey, We added SGDW and AdamW in optim, accoridng to the new ICLR submission from Loshchilov and Hutter: Fixing Weight Decay Regularization in Adam. We also found some inconsistency of the current i...

snakers4 (Alexander), August 29, 08:16

Crowd-AI maps repo

Just opened my repo for crowd AI maps 2018.

Did not pursue this competition till the end, so it is not polished, .md is not updated. Use it at your own risk!



CrowdAI mapping challenge 2018 solution. Contribute to snakers4/crowdai-maps-2018 development by creating an account on GitHub.

snakers4 (Alexander), August 26, 08:16

A small bug in PyTorch to numpy conversions

Well, maybe a feature =)

When you do something like this

a.permute(0, 2, 3, 1).numpy()


a.view((same shape here)).numpy()

you may expect behaviour similar to np.reshape.

But no. Looks like now all of these functions are implemented in C and they produce artifacts. To avoid artifacts call:

a.permute(0, 2, 3, 1).contiguous().numpy()

a.view((same shape here)).contiguous().numpy()

It is also iteresting, because when you apply some layers in PyTorch error is raised when you do not use .contiguous().


snakers4 (Alexander), August 21, 13:31

2018 DS/ML digest 21




2018 DS/ML digest 21

2018 DS/ML digest 21 Статьи автора - Блог -

snakers4 (Alexander), August 20, 10:57

PyTorch - a brief state of sparse operations

TLDR - they are not there yet


If you would like to see them implemented faster, write here



The state of sparse Tensors #9674

This note tries to summarize the current state of sparse tensor in pytorch. It describes important invariance and properties of sparse tensor, and various things need to be fixed (e.g. empty sparse tensor). It also shows some details of ...

snakers4 (Alexander), August 19, 13:20

Nice down-to-earth post about Titan V vs 1080 Ti

Also it seems that new generation nvidia GPUs will have 12 GB of RAM tops.

1080Ti is the best option for CNNs now.


Titan V vs 1080 Ti — Head-to-head battle of the best desktop GPUs on CNNs. Is Titan V worth it? 110 TFLOPS! no brainer, right?

NVIDIA’s Titan V is the latest “desktop” GPU built upon the Volta architecture boasting 110 “deep learning” TFLOPS in the spec sheet. That…

snakers4 (Alexander), August 13, 05:24

Float16 / half training in PyTorch

Tried to do it in the most obvious way + some hacks from here

Did anybody do it successfully on real models?


Resnet18 throws exception on conversion to half floats

Hey, I have tried to launch the following code: from torchvision import models resnet = models.resnet18(pretrained=True).cpu() resnet.half() and have got an exception: libc++abi.dylib: terminating with uncaught exception of type std::invalid_argument: Unsupported tensor type Sounds like the halftensor type is not registered properly. But not sure why it’s the case. Using pytorch 0.2.0 py36_1cu75 soumith Any advice how to fix it?

snakers4 (Alexander), August 12, 11:15

2018 DS/ML digest 20




2018 DS/ML digest 20

2018 DS/ML digest 20 Статьи автора - Блог -

snakers4 (Alexander), August 03, 02:23

snakers4 (Alexander), July 31, 18:33

Autofocus for semseg?

I have not seen people for whom DeepLab worked...and in my tests dilated convolutions were the same...though some claim they help with high-res images with small objects...


(0) Autofocus layer, a novel module that enhances the multi-scale processing of CNNs by learning to select the ‘appropriate’ scale for identifying different objects in an image

(1) Layer description

(2) Implementation

I believe this will work best for 3D images


snakers4 (Alexander), July 31, 05:47

2018 DS/ML digest 19

Market / data / libraries

(0) 32k lesions image dataset open-sourced



(1) A new Distill article about Differentiable Image Parameterizations

- Usually images are parametrized as RGB values (normalized)

- Idea - use different (learnable) parametrization


- Parametrizing resulting image with fourier transform enables to use different architectures with style transfer

- Working with transparent images

(2) Lip reading with 40% Word Error Rate

(3) Joing auto architecture + hyper param search (*)


(5) New CNN architectures from ICML (*)

(6) Jupiter notebook widget for text annotaion

(7) A bit more debunking of auto-ml by

(8) A small intro to Bayes methods

(9) Criminal face recognition 20% false positives -

(10) Denoising images wo noiseless ground-truth


(0) Autoencoders for text - no clear conclusion?

(1) RNN use cases overview

(2) ACL 2018 notes


(0) Edge embeddable TPU devices ?

(1) GeForce 11* finally coming soon? Prices for 1080Ti are falling now...



NIH Clinical Center releases dataset of 32,000 CT images

Lesion data may make it easier for scientific community to identify tumor growth or new disease

snakers4 (Alexander), July 31, 04:42

Airbus ship detection challenge

On a surface this looks like a challenging and interesting competition:


- Train / test sets - 14G / 12G

- Downside - Kaggle and very fragile metric

- Upside - a separate significant price for fast algorithms!

- 768x768 images seem reasonable



Airbus Ship Detection Challenge

Find ships on satellite images as quickly as possible

snakers4 (Alexander), July 30, 05:48

The reality of human face recognition

There is a lot of hype related to the surveillance state / 1984 / Chinese offline cameras.

Cannot help but feature this amazing article from Russian engineers (RU):


Правда и ложь систем распознавания лиц

Пожалуй нет ни одной другой технологии сегодня, вокруг которой было бы столько мифов, лжи и некомпетентности. Врут журналисты, рассказывающие о технологии, врут...

snakers4 (Alexander), July 28, 05:13

New Keras version

No real major changes...



Deep Learning for humans. Contribute to keras-team/keras development by creating an account on GitHub.

snakers4 (Alexander), July 27, 03:17

The truth about ML courses


snakers4 (Alexander), July 23, 06:26

My post on open images stage 1

For posterity

Please comment



Solving class imbalance on Google open images

In this article I propose an appoach to solve a severe class imbalance on Google open images Статьи автора - Блог -

snakers4 (Alexander), July 23, 05:15

2018 DS/ML digest 18

Highlights of the week

(0) RL flaws

(1) An intro to AUTO-ML

(2) Overview of advances in ML in last 12 months

Market / applied stuff / papers

(0) New Nvidia Jetson released

(1) Medical CV project in Russia - 90% is data gathering

(2) Differentiable architecture search

-- 1800 GPU days of reinforcement learning (RL) (Zoph et al., 2017)

-- 3150 GPU days of evolution (Real et al., 2018)

-- 4 GPU days to achieve SOTA in CIFAR => transferrable to Imagenet with 26.9% top-1 error

(3) Some basic thoughts about hyper-param tuning

(4) FB extending fact checking to mark similar articles

(5) Architecture behind Alexa choosing skills

- Char-level RNN + Word-level RNN

- Shared encoder, but attention is personalized

(6) An overview of contemporary NLP techniques

(7) RNNs in particle physics?

(8) Google cloud provides PyTorch images


(0) Use embeddings for positions - no brainer

(1) Chatbots were a hype train - lol

The vast majority of bots are built using decision-tree logic, where the bot’s canned response relies on spotting specific keywords in the user input.Interesting links

(0) Reasons to use OpenStreetMap

(1) Google deployes its internet ballons

(2) Amazing problem solving

(3) Nice flame thread about CS / ML is not science / just engineering etc




RL’s foundational flaw

RL as classically formulated has lately accomplished many things - but that formulation is unlikely to tackle problems beyond games. Read on to see why!

snakers4 (Alexander), July 22, 08:55

Playing with open-images

Did a benchmark of multi-class classification models and approaches useful in general with multi-tier classificators.

The basic idea is - follow the graph structure of class dependencies - train a good multi-class classifier => train coarse semseg models for each big cluster.

What worked

- Using SOTA classifiers from imagenet

- Pre-training with frozen encoder (otherwise the model performes worse)

- Best performing architecture so far - ResNet152 (a couple of others to try as well)

- Different resolutions => binarise them => divide into 3 major clusters (2:1,1:2,1:1)

- Using adaptive pooling for different aspect ratio clusters

What did not work or did not significantly improve results

- Oversampling

- Using modest or minor augs (10% or 25% of images augmented)

What did not work

- Using 1xN + Nx1 convolutions instead of pooling - too heavy

- Using some minimal avg. pooling (like 16x16), then using different 1xN + Nx1 convolutions for different clusters - performed mostly worse than just adaptive pooling

Yet to try

- Focal loss

- Oversampling + augs


snakers4 (Alexander), July 21, 11:02

Found an amazing explanation about Python's super here

Understanding Python super() with __init__() methods

I'm trying to understand the use of super(). From the looks of it, both child classes can be created, just fine. I'm curious to know about the actual difference between the following 2 child clas...

Playing with focal loss for multi-class classification

Playing with this Loss

If anyone has a better option - please PM me / or comment in the gist.



Multi class classification focal loss

Multi class classification focal loss . GitHub Gist: instantly share code, notes, and snippets.

snakers4 (Alexander), July 21, 07:51

Yet another kaggle competition with high prizes and easy challenge


TGS Salt Identification Challenge

Segment salt deposits beneath the Earth's surface

snakers4 (Alexander), July 18, 05:39

Lazy failsafe in PyTorch Data Loader

Sometimes you train a model and testing all the combinations of augmentations / keys / params in your dataloader is too difficult. Or the dataset is too large, so it would take some time to check it properly.

In such cases I usually used some kind of failsafe try/catch.

But looks like even simpler approach works:

if img is None:

# do not return anything



return img



snakers4 (Alexander), July 17, 08:51

Colab SeedBank

- TF is everywhere (naturally) - but at least they use keras

- On the other hand - all of the files are (at least now) downloadable via .ipynb or .py

- So - it may be a good place to look for boilerplate code

Also interesting facts, that are not mentioned openly

- Looks like they use Tesla K80s, which practically are 2.5-3x slower than 1080Ti


- Full screen notebook format is clearly inspired by Jupyter plugins

- Ofc there is a time limit for GPU scripts and GPU availability is not guaranteed (reported by people who used it)

- Personally - it looks a bit like slow instances from FloydHub - time limitations / slow GPU etc/etc

In a nutshell - perfect source of boilerplate code + playground for new people.


Benchmarking Tensorflow Performance and Cost Across Different GPU Options

Machine learning practitioners— from students to professionals — understand the value of moving their work to GPUs . Without one, certain…

snakers4 (Alexander), July 17, 08:32