Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1227 members, 1357 posts since 2016

All this - lost like tears in rain.

Internet, data science, math, deep learning, philosophy. No bs.

Our website
Our chat
DS courses review

snakers4 (Alexander), February 27, 09:21

Savva's company (my jungle teammate) made a brief post about the competition

ObjectStyler Savva Kolbachev wins 3rd prize in Computer Vision contest by Chimp&See -

Congrats to ObjectStyler Savva Kolbachev and his team for winning the 3rd prize in Computer Vision contest organized by Chimp&See!

snakers4 (Alexander), February 25, 14:22

The more general uncertainty principle, beyond quantum
The Heisenberg uncertainty principle is just one specific example of a much more general, relatable, non-quantum phenomenon. Apply to work at one of my favor...

snakers4 (Alexander), February 24, 14:01

Just found a book on practical Python programming patterns


Looks good


snakers4 (Alexander), February 24, 05:56

2017 DS/ML digest 5

Fun stuff

(1) Hardcore metal + CNNs + style transfer -

SpaceNet challenge

(1) Post by Nvidia

(2) Some links to sota semseg articles

(3) Useful tools for CV - floodfill and grabcut, but guys from Nvidia did not notice ... that road width was in geojson data...

(4) Looks like they replicated the results just for PR, but their masks do not look appealing

Research / papers / libraries

(1) Neural Voice Cloning with a Few Samples - (demos

(2) A library for CRFs in Python -

(3) 1000x faster CNN architecture search - still on CIFAR - (PyTorch

(4) URLs + CNN - malicious link detection -


(1) 3m anime image dataset -

(2) Google HDR dataset -


(1) Idea - AMT + blockchain -

(2) ARM to make processors for CNNs? -

(3) Google TPU in beta - - very expensive. + Note the rumours that Google's own people do not use their TPU quota

(4) One guy managed to deploy a PyTorch model using ONNX -




Hardcore Anal Hydrogen "Jean-Pierre" (2018, Apathia Records)

Order "Hypercut" : Bandcamp : « A gigantic piece of art here to mess with wh...

snakers4 (Alexander), February 23, 15:30

Forwarded from Just links:

Adversarial Examples that Fool both Human and Computer Vision

snakers4 (Alexander), February 23, 06:30

Was looking for CLAHE abstraction for my image pre-processing pipeline and found one in the Internet

class CLAHE:

def __init__(self, clipLimit=2.0, tileGridSize=(8, 8)):

self.clipLimit = clipLimit

self.tileGridSize = tileGridSize

def __call__(self, im):

img_yuv = cv2.cvtColor(im, cv2.COLOR_BGR2YUV)

clahe = cv2.createCLAHE(clipLimit=self.clipLimit, tileGridSize=self.tileGridSize)

img_yuv[:, :, 0] = clahe.apply(img_yuv[:, :, 0])

img_output = cv2.cvtColor(img_yuv, cv2.COLOR_YUV2BGR)

return img_output


snakers4 (Alexander), February 22, 19:11

This AI Sings | Two Minute Papers #230
The paper "A Neural Parametric Singing Synthesizer" is available here: Our Patreon page with the details: www....

snakers4 (Alexander), February 22, 09:41

So ofc I tried the new Jupyter lab.

And it is really cool that something so simple / cool / useful is completely free / no strings attached (yet). But I will not use it professionally.

Use my Dockerfile if you want to check it out with my DL environment:


But in a nutshell it worked with jpn params inside the container

CMD jupyter lab --port=8888 --ip= --no-browserAnd installation is as easy as

conda install -c conda-forge jupyterlabDocs are a bit sparse for now


But this is a list of reasons, why you might consider sticking to ssh pass-through for auto-complete / terminal and jupyter notebook with extensions:

(0) It is still in beta, so unless your professional path is connected with node-js / web - you better pass now

(1) The existence of amazing extensions for Jupyter notebook that do 95% of what you might need -

(2) Built-it terminal is much better than before, but it pales in comparison with Putty or even standard linux shell (autocomplete?)

(3) Some of built-in extensions like image viewer are really useful, but overall the product is a bit beta (which they openly say it is)

And here is why turning Jupyter notebook into a real environment is really cool:

(1) Building everything based on extensions IS REALLY COOL - and in the long run will encourage people to port jupyter extensions and build a really powerful tool. Also this implies diversity and freedom unlike shitty tools like Zeppelin

(2) After some effort, it may really replace terminal, IDE, desktop environment and notebooks for data-oriented people (I guess 6-12 monhts)

(3) Structuring extensions and npm packages lures the most fast developing web-developer community to support the project and provides transparency and clarity


Dockerfile update

snakers4 (Alexander), February 20, 16:27


JupyterLab is Ready for Users

We are proud to announce the beta release series of JupyterLab, the next-generation web-based interface for Project Jupyter.

snakers4 (Alexander), February 20, 04:40

Internet Digest

- Ben Evans -

- Flipboard (orly) launches ads -

- Google sold 3.9 million Pixel phones in 2017 -

- Looks like smartbuses may be cool. App => bus route information => route gap => launch cosy bus with music and social features - (I doubt this is a business though)

- About the importance of decentralization - next Internet will be a set of cryptonetwork protocols -

- How London is responding to technological

(1) Connected and autonomous vehicles (CAVs) or driverless (2) cars won't be on the road until the 2030s at least and could add to congestion

(3) Dockless cycle schemes need to be able to operate across London to be effective

(4) There is no control system in place for drones and droids

(5) TfL is monitoring technological developments but this needs to be embedded across the whole organisation

- Nice info graphics about city dwellers daily routes on pages 7-10 -



snakers4 (Alexander), February 20, 04:16

So a couple of things -

- Movidius USB stick is enough to launch real-time object detection which is interesting to know

- It has shitty driver and library support (Caffe was mentioned)

- Installing everything is FAR from trivial (no idea why virtual box was used, but whatever)

- This guide uses Virtual box instead of Docker which says much

Also PyImageSearch is a sellout - he most likely has advertiser-friendly featured content in this post, looks like the Movidius stick topcoder event did not gain enough traction...

So - use Nvidia Jetsons for embedded solutions and do not bother with this. But it's good that new products emerge.


Real-time object detection on the Raspberry Pi with the Movidius NCS - PyImageSearch

In this tutorial I'll demonstrate how you an achieve real-time object detection on the Raspberry Pi using deep learning and Intel's Movidius NCS.

snakers4 (Alexander), February 20, 01:49

Pruning Makes Faster and Smaller Neural Networks | Two Minute Papers #229
The paper "Learning to Prune Filters in Convolutional Neural Networks" is available here: We would like to thank our gen...

snakers4 (Alexander), February 19, 10:25

One more article about usual suspects when your CNN fails to train



37 Reasons why your Neural Network is not working

The network had been training for the last 12 hours. It all looked good: the gradients were flowing and the loss was decreasing. But then…

snakers4 (Alexander), February 18, 08:05

Even though I am preparing a large release on GAN application on real example, I just could not help sharing these 2 links.

They are just an absolute of perfection for GANs on PyTroch



Also this is the most idiomatic PyTorch code (Imagenet finetuning) code I have ever seen


So if you are new to PyTorch, then these links will be very useful)





Contribute to WassersteinGAN development by creating an account on GitHub.

Which of the latest projects did you like the most?

Still waiting for GANs – 17

👍👍👍👍👍👍👍 49%

Satellites! – 9

👍👍👍👍 26%

Nothing / not interested / missed them – 5

👍👍 14%

Jungle! – 3

👍 9%

PM me for other options – 1

▫️ 3%

👥 35 people voted so far.

snakers4 (Alexander), February 16, 12:30

Yandex is cool - they demonstate self-driving technology of ~2012 - 15 km/h in a very controlled setting.

if you watch Andrew Ng's original course - NASA did the same with humvees in 2010-2012 on isolated tracks.


"Яндекс" протестировал беспилотное такси на московских улицах

Поездка прошла полностью в автоматическом режиме

snakers4 (Alexander), February 16, 11:14

The Real state of Deep RL



Deep Reinforcement Learning Doesn't Work Yet

This mostly cites papers from Berkeley, Google Brain, DeepMind, and OpenAI from the past few years, because that work is most visible to me. I’m almost certainly missing stuff from older literature and other institutions, and for that I apologize - I’m just one guy, after all.

snakers4 (Alexander), February 16, 11:04

New datasets

(1) HDR Dataset from Google

3,640 bursts of full-resolution raw images, made up of 28,461 individual images, along with HDR+ intermediate and final results for comparison

(2) Huge Anime dataset - 2.9m+ images annotated with 77.5m+ tags -


Introducing the HDR+ Burst Photography Dataset

Posted by Sam Hasinoff, Software Engineer, Machine Perception Burst photography is the key idea underlying the HDR+ software on Google's...

snakers4 (Alexander), February 15, 09:50

Visualizing Large-scale and High-dimensional Data:

A paper behind an awesome library


Follows the success of T-SNE, but is MUCH faster

Typical visualization pipeline

Also works awesomely with

Convergence speed

(1) (on a machine with 512GB memory, 32 cores at 2.13GHz)

(2) 3m data points * 100 dimensions, LargeVis is up 30x faster at graph construction and 7x at graph visualization





T-SNE drawbacks

(1) K-nearest neighbor graph = computational bottleneck

(2) T-SNE constructs the graph using the technique of vantage-point trees, the performance of which significantly deteriorates for high dimensions

(3) Parameters of the t-SNE are very sensitive on different data sets

Algorithm itself

(1) Create a small number of projection trees (similar to random forest). Then for each node of the graph search the neighbors of its neighbors, which are also likely to be candidates of its nearest neighbors

(2) Use SGD (or asyncronous SGD) to minize graph loss

(3) Clever sampling - sample the edges with the probability proportional to their weights and then treat the sampled edges as binary edges. Also sample some negative (not observed) edges




umap - Uniform Manifold Approximation and Projection

snakers4 (Alexander), February 15, 08:29

Pytorch 0.3.1


pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration

snakers4 (Alexander), February 15, 02:12

Announcing Tensor Comprehensions

Today, Facebook AI Research (FAIR) is announcing the release of Tensor Comprehensions, a C++ library and mathematical language that helps bridge the gap between researchers, who communicate in terms of mathematical operations, and engineers focusing on the practical needs of running large-scale models on various hardware backends. The main differentiating feature of Tensor Comprehensions is…

snakers4 (Alexander), February 14, 16:28

Google's Text Reader AI: Almost Perfect | Two Minute Papers #228
The paper "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions" is available here:

snakers4 (Alexander), February 14, 15:00

Ripple Explained
Can Ripple be considered a cryptocurrency? Is it centralized? Why is it getting adopted by so many banks so fast? Lots of questions, I'm going to answer them...

snakers4 (Alexander), February 14, 11:48

2017 DS/ML digest 4

Applied cool stuff

- How Dropbox build their OCR - via CTC loss -

Fun stuff

- CNN forward pass done in Google Sheets -

- New Boston Robotics robot - opens doors now -

- Cool but toothless list of jupyter notebooks with illustrations and models

- Best CNN filter visualization tool ever -

New directions / moonshots / papers

- IMPALA from Google - DMLab-30, a set of new tasks that span a large variety of challenges in a visually unified environment with a common action space



- Trade crypto via RL -

- SparseNets? -

- Use Apple watch data to predict diseases

- Google - Evolution in auto ML kicks in faster than RL -

- R-CNN for human pose estimation + dataset

-- Website + video

-- Paper

Google's Colaboratory gives free GPUs?

- Old GPUs

- 12 hours limit, but very cool in theory



Sick sad world

- China has police Google Glass with face recognition

- Why slack sucks -

-- Email + google docs is better for real communication


- Globally there are 22k ML developers

- One more AI chip moonshot -

- Google made their TPUs public in beta - US$6 per hour

- CNN performance comparable to human level in dermatology (R-CNN) -

- Deep learning is greedy, brittle, opaque, and shallow

- One more medical ML investment - US$25m for cancer -




snakers4 (Alexander), February 14, 10:29

Also this is related to modern OCR - how DropBox did it

snakers4 (Alexander), February 14, 04:54

Article on SpaceNet Challenge Three in Russian on habrhabr - please support us with your comments / upvotes


Also if you missed:

- The original article

- The original code release

... and Jeremy Howard from retweeted our solution, lol



But to give some idea which pain the TopCoder platform induces on the contestants, you can read

- Data Download guide

- Final testing guide

- Code release for their verification process




Из спутниковых снимков в графы (cоревнование SpaceNet Road Detector) — попадание топ-10 и код (перевод)

Привет, Хабр! Представляю вам перевод статьи. Это Вегас с предоставленной разметкой, тестовым датасетом и вероятно белые квадраты — это отложенная валидация...

snakers4 (Alexander), February 13, 08:19

Internet digest

- Ben Evans -

- FB tried to buy Snapchat 2 times - for US$60m and US$3b -

- Allegedly some ML can achieve 85% diabetes prediction accuracy on apple watch sensor data -

- Cars may embrace 48 volts instead of 12 volts -

- Google reabsorbs Nest (read between the lines - it was successful) -

- Snap +70% revenue growth -

- 7 of 8 USA top grocers participate in Instacart -

- Siri APIs are fragmented lol -

- Uber agreed to provide Waymo, the self-driving car unit under Google’s parent company, Alphabet, with 0.34 percent of its stock -



snakers4 (Alexander), February 13, 07:55

Interesting hack from n01z3 from ODS

For getting that extra 1%

Snapshot Ensembles / Multi-checkpoint TTA:


- Train CNN with LR decay until convergence, use SGD or Adam

- Use cyclic LR starting to train the network from the best checkpoint, train for several epochs

- Collect checkpoints with the best loss and use them for ensembles / TTA


Google TPUs are released in beta..US$200 per day?

No thank you! Also looks like only TF is supported so far.

Combined with rumours, sounds impractical.



Cloud TPU machine learning accelerators now available in beta

By John Barrus, Product Manager for Cloud TPUs, Google Cloud and Zak Stone, Product Manager for TensorFlow and Cloud TPUs, Google Brain Team...

snakers4 (Alexander), February 12, 04:18

Useful links about Datashader

- Home -

- Youtube presentation, practical presentations

-- OpenSky

-- 300M census data

-- NYC Taxi data

- Readme (md is broken)

- Datashader pipeline - what you need to understand to use it with examples -

Also see 2 images above)



Datashader — Datashader 0.6.5 documentation

Turns even the largest data into images, accurately.

snakers4 (Alexander), February 11, 18:28

A note on reusing my old hard-drives from mdadm raid10 array in a new raid0 array after buying more hard drives (

Ideally this command should remove the superblock from old disks

sudo mdadm --zero-superblock /dev/sdcBut in practice I faced a problem, when only after something of this sort (

dd bs=512 count=63 if=/dev/zero of=/dev/sdaraid arrays started properly on reboot. This happened to both old raid10 disks and a disk that was used as plain storage. Magic.

Ofc you can shred and fill the whole disk with zeros, but it takes a lot of time...

sudo shred -v -n1 -z /dev/sda


How To Create RAID Arrays with mdadm on Ubuntu 16.04 | DigitalOcean

Linux's madam utility can be used to turn a group of underlying storage devices into different types of RAID arrays. This provides various advantages depending on which RAID level is used. This guide will cover how to set up devices in the most common

snakers4 (Alexander), February 11, 07:02

In a nutshell - Datashader is AWESOME

Instead of this