Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1252 members, 1404 posts since 2016

All this - lost like tears in rain.

Internet, data science, math, deep learning, philosophy. No bs.

Our website
- spark-in.me
Our chat
- goo.gl/WRm93d
DS courses review
- goo.gl/5VGU5A
- goo.gl/YzVUKf

snakers4 (Alexander), April 19, 15:36

Given the current situation ... which post / guide would you like next?

DS / ML related (back log of hobby projects)! – 32

👍👍👍👍👍👍👍 65%

OpenVPN + Docker – 9

👍👍 18%

Dante proxy + Arubacloud + DigitalOcean + Vultr + Docker – 8

👍👍 16%

👥 49 people voted so far.

snakers4 (Alexander), April 18, 13:37

Nice ideas about unit testing ML code

medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765

#deep_learning

How to unit test machine learning code.

Note: The popularity of this post has inspired me to write a machine learning test library. Go check it out!


snakers4 (Alexander), April 17, 19:14

Andrew NG released first 4 chapters of his new book

So far looks not really technical

- gallery.mailchimp.com/dc3a7ef4d750c0abfc19202a3/files/704291d2-365e-45bf-a9f5-719959dfe415/Ng_MLY01.pdf

#data_science

Download Ng_MLY01.pdf 1.52 MB

snakers4 (Alexander), April 17, 08:50

DS Bowl 2018 top solution

www.kaggle.com/c/data-science-bowl-2018/discussion/54741

#data_science

This is really interesting...their approach to separation is cool

snakers4 (Alexander), April 17, 07:39

Nice realistic article about bias in embeddings by Google

developers.googleblog.com/2018/04/text-embedding-models-contain-bias.html

#google

#nlp

Text Embedding Models Contain Bias. Here's Why That Matters.

Human data encodes human biases by default. Being aware of this is a good start, and the conversation around how to handle it is ongoing. At Google, we are actively researching unintended bias analysis and mitigation strategies because we are committed to making products that work well for everyone. In this post, we'll examine a few text embedding models, suggest some tools for evaluating certain forms of bias, and discuss how these issues matter when building applications.


snakers4 (Alexander), April 17, 06:30

Also what is interesting, despite the fact that geektimes blocked my SOCKS proxy post and the fact that marketing based web-sites stole it (in Russian), I received the following feedback:

- 3 people thanked me in the ODS channel

- 3 people thanked me via email

- 2 people thanked me in geektimes PM

Also this is also interesting - my referral link was hit 165 times and ~50 people registered =)

- prntscr.com/j69w85

So if you missed the fun

- Post spark-in.me/post/vds-socks5-proxy-server

- Referral link m.do.co/c/6f8e77dddc23

- Note that the final config is in the comments and here (thanks to t.me/bykvaadm and its admin)

sudo apt update && apt upgrade

wget launchpad.net/ubuntu/+archive/primary/+files/dante-server_1.4.2+dfsg-2build1_amd64.deb

dpkg -i dante-server_1.4.2+dfsg-2build1_amd64.deb

echo '

logoutput: syslog /var/log/danted.log

internal: eth0 port = 1080

external: eth0

socksmethod: username

user.privileged: root

user.unprivileged: nobody

client pass {

from: 0.0.0.0/0 to: 0.0.0.0/0

log: error

}

socks pass {

from: 0.0.0.0/0 to: 0.0.0.0/0

command: connect

log: error

socksmethod: username

}' > /etc/danted.conf

# basic ufw installation

sudo apt-get install ufw

sudo ufw status

# wiki.dieg.info/socks

sudo ufw allow ssh

sudo ufw allow proto tcp from any to any port 1080

sudo ufw status numbered

sudo ufw enable

sudo systemctl enable danted

sudo useradd --shell /usr/sbin/nologin av_socks && sudo passwd av_socks

So, thanks to bykvaadm for his feedback and support and to everybody else.

#linux

Screenshot

Captured with Lightshot


Also someone just bought us a coffee

- www.buymeacoffee.com/8oneCIN

Please consider supporting us for more quality content

Usually it takes several hours (to a month if it is about a competition) to write and does not pay well

And when people steal your content to put their refcodes in it, it's painful (

Buy Alexander Veysov a Coffee - BuyMeACoffee.com

A practitioner in the field of Data Science / Deep Learning


snakers4 (Alexander), April 16, 10:17

A draft of the article about DS Bowl 2018 on Kaggle.

This time this was a lottery.

Good that I did not really spend much time, but this time I learned a lot about watershed and some other instance segmentation methods!

An article is accompanied by a dockerized PyTorch code release on GitHub:

- spark-in.me/post/playing-with-dwt-and-ds-bowl-2018

- github.com/snakers4/ds_bowl_2018

This is a beta, you are welcome to comment and respond.

Kudos!

#data_science

#deep_learning

#instance_se

Applying Deep Watershed Transform to Kaggle Data Science Bowl 2018 (dockerized solution)

In this article I will describe my solution to the DS Bowl 2018 and why it was a lottery and post a link to my dockerized solution Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), April 15, 09:49

A DISCIPLINED APPROACH TO NEURAL NETWORK HYPER-PARAMETERS: PART 1 – LEARNING RATE, BATCH SIZE, MOMENTUM, AND WEIGHT DECAY

(0) arxiv.org/abs/1803.09820, Leslie N. Smith US Naval Research Laboratory

(1) Will serve as a good intuition starter if you have little experience (!)

(2) Some nice ideas:

- The test/validation loss is a good indicator of the network’s convergence - especially in early epochs

- The amount of regularization must be balanced for each dataset and architecture

- The practitioner’s goal is obtaining the highest performance while minimizing the needed computational time

(smaller batch - less stability and faster convergence)

- Optimal momentum value(s) will improve network training

(3) The author does not study the difference between SGD and Adam in depth =( Adam kind of solves much of his pains

(4) In my practice the following approach works best:

- Aggressive training with Adam to find the optimal LR

- Apply various LR decay regimes to determine the optimal

- Use low LR or CLR in the end to converge to a lower value (possible overfitting)

- Test on test / delayed test end-to-end

- In my experience - a strong model with good params will start with test/val set loss much lower / target metric much higher than on the train set

- In some applications if your CNN is memory intesive - you just opt for the largest batch possible (usually >6-8 works)

- Also there is no mention of augmentations - they usually help reduce overfitting much better than hyper parameters

#deep_learning

Nice read about systemctl

www.digitalocean.com/community/tutorials/how-to-use-systemctl-to-manage-systemd-services-and-units

#linux

How To Use Systemctl to Manage Systemd Services and Units | DigitalOcean

Systemd is an init system and system manager that is widely becoming the new standard for Linux machines. While there is considerable controversy as to whether systemd is an improvement over the init systems it is replacing, the majority of distributi


snakers4 (Alexander), April 15, 08:06

2018 DS/ML digest 8

As usual my short bi-weekly (or less) digest of everything that passed my BS detector

Market / blog posts

(0) Fast.ai about the importance of accessibility in ML - www.fast.ai/2018/04/10/stanford-salon/

(1) Some interesting news about market, mostly self-driving cars (the rest is crap) - goo.gl/VKLf48

(2) US$600m investment into Chinese face recognition - goo.gl/U4k2Mg

Libraries / frameworks / tools

(0) New 5 point face detector in Dlib for face alignment task - goo.gl/T73nHV

(1) Finally a more proper comparsion of XGB / LightGBM / CatBoost - goo.gl/AcszWZ (also see my thoughts here snakers41.spark-in.me/1840)

(3) CNNs on FPGAs by ZFTurbo

-- www.youtube.com/watch?v=Lhnf596o0cc

-- github.com/ZFTurbo/Verilog-Generator-of-Neural-Net-Digit-Detector-for-FPGA

(4) Data version control - looks cool

-- dataversioncontrol.com

-- goo.gl/kx6Qdf

-- but I will not use it - becasuse proper logging and treating data as immutable solves the issue

-- looks like over-engineering for the sake of overengineering (unless you create 100500 datasets per day)

Visualizations

(0) TF Playground to seed how simplest CNNs work - goo.gl/cu7zTm

Applications

(0) Looks like GAN + ResNet + Unet + content loss - can easily solve simpler tasks like deblurring goo.gl/aviuNm

(1) You can apply dilated convolutions to NLP tasks - habrahabr.ru/company/ods/blog/353060/

(2) High level overview of face detection in ok.ru - goo.gl/fDUXa2

(3) Alternatives to DWT and Mask-RCNN / RetinaNet? medium.com/@barvinograd1/instance-embedding-instance-segmentation-without-proposals-31946a7c53e1

- Has anybody tried anything here?

Papers

(0) A more disciplined approach to training CNNs - arxiv.org/abs/1803.09820 (LR regime, hyper param fitting etc)

(1) GANS for iamge compression - arxiv.org/pdf/1804.02958.pdf

(2) Paper reviews from ODS - mostly moonshots, but some are interesting

-- habrahabr.ru/company/ods/blog/352508/

-- habrahabr.ru/company/ods/blog/352518/

(3) SqueezeNext - the new SqueezeNet - arxiv.org/abs/1803.10615

#digest

#data_science

#deep_learning

snakers4 (Alexander), April 15, 07:36

So, I used to use Chromium based Opera.

Now I switched to the new Firefox, which is fast, boasts a lot of security extensions and looks also clean and nice. Their mobile apps are a bit unpolished, but also good.

Looks like the rewrote rendering from scratch - because a year ago it was slow.

snakers4 (Alexander), April 14, 12:25

Found an applied channel (RU) about security and admin stuff

Looks professional

- t.me/bykvaadm

- also the channel's admin posted some useful remarks here

-- geektimes.ru/post/299971/

#linux

Админим с Буквой

Канал о системном администрировании, DevOps и немного Инфобеза. По всем вопросам обращаться к @bykva https://t.me/joinchat/CwI6k0hYW_Bn4TYvxvDiSQ флуд и обсуждение. обсуждение и флуд.


snakers4 (Alexander), April 14, 09:05

Out post is live on Russian reddit - geektimes

- geektimes.ru/post/299971/

Please support if you have a valid account!

#internet

Простая пошаговая настройка SOCKS5 прокси сервера под Ubuntu 16 за 10-15 минут

Простая пошаговая настройка SOCKS5 прокси сервера под Ubuntu 16 Данная статья является переводом статьи отсюда. Стиль и особенности речи автора сглажены, но в...


snakers4 (Alexander), April 13, 23:08

So, I just found out that Firefox rendering engine was rewritten, now it boasts the fastest speeds and support for ... socks5 proxies, both on mobile and desktop.

- github.com/FelisCatus/SwitchyOmega/

- hacks.mozilla.org/2017/08/inside-a-super-fast-css-engine-quantum-css-aka-stylo/

Also projects like orbot+orfox help in more extreme cases.

#internet

FelisCatus/SwitchyOmega

SwitchyOmega - Manage and switch between multiple proxies quickly & easily.


snakers4 (Alexander), April 13, 15:44

What proxy will you use?

Other VPN-like solutions – 17

👍👍👍👍👍👍👍 27%

Luckily, it does not apply to me – 15

👍👍👍👍👍👍 23%

A free / public / provided by special channels – 15

👍👍👍👍👍👍 23%

I will try your guide – 14

👍👍👍👍👍👍 22%

I will stop using Telegram – 2

👍 3%

Let's wait, maybe they will reconsider – 1

▫️ 2%

👥 64 people voted so far.

snakers4 (Alexander), April 13, 15:30

So, usually I try to stay away from such controversial topics, but I have to address and elephant in the room. You all know, that originally I am from Russia and I have quite liberal world views.

Seeing that many people start to ride the hype and advertising some expensive "solutions", this is why today I decided to do a post about creating your own SOCK5 proxy server via a droplet on Digital Ocean:

- Post - spark-in.me/post/vds-socks5-proxy-server - note that unlike my other posts - this one is a step-by-step explanation;

- It explains how to create your own SOCK5 proxy-server using Ubuntu and Digital Ocean with dante;

- The cheapest digital ocean droplet is US$5 per month (you can find such droplets for as low as US$2-3 with inferior service);

- If you use my referral link - you will get US$10 for free - m.do.co/c/6f8e77dddc23

- Also you can create credentials for your friends and family;

Also note, that foreseeing this s**t - I created aliases for our telegram channel

- In twitter twitter.com/AlexanderVeysov

- In the web snakers41.spark-in.me

- RSS snakers41.spark-in.me/rss/

UX is not so great, but it works more or less. Please tell me what you think. I know that the majority of readers are Russians and we have quite a negative mentality, but this is one of the cases when you have to share this message and my post as much as possible. We will be doing an adapted post on habrhabr.ru as well.

And I know that there are free proxy lists. But if you create a simple service today - tomorrow you can add layers to it (see some hints in the article) and not rely on other people.

If you like what I shared - please support our channel (see a pinned message)

- buymeacoff.ee/8oneCIN

#internet

#digital_freedom

Playing with a simple SOCKS5 proxy server on Digital Ocean and Ubuntu 16

This article tells you how to start your SOCKS5 proxy with zero to little experience Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), April 13, 07:21

So I briefly dug into running a containerized GPU accelerated GUI app (I want to be able to run some apps I do not really want on my host).

Docker kind of works for this purpose, but I found working guides for nvidia-docker, not nvidia-docker2.

Looks like if you want to run a Linux host with a Linux container - then LXD is a good option. It is high level and seems to have an easy API to use. I will report if that will work for me.

- Guide blog.simos.info/how-to-run-graphics-accelerated-gui-apps-in-lxd-containers-on-your-ubuntu-desktop/

- LXD vs Docker unix.stackexchange.com/questions/254956/what-is-the-difference-between-docker-lxd-and-lxc/254982

- Extensive LXD tutorial stgraber.org/2016/03/11/lxd-2-0-introduction-to-lxd-112/

#linux

How to run graphics-accelerated GUI apps in LXD containers on your Ubuntu desktop

In How to run Wine (graphics-accelerated) in an LXD container on Ubuntu we had a quick look into how to run GUI programs in an LXD (Lex-Dee) container, and have the output appear on the local X11 s…


Soviet arcade video games museum (pics)

- aminux.wordpress.com/2018/04/12/moscow-arcade-machine-museum/comment-page-1/#comment-1750

Ещё один занятный музей

Продолжая музейную тему, на этот раз заглянем в Москву. Только двинем мы не в Кремль, а в место более для меня интересное.…


snakers4 (Alexander), April 12, 15:32

youtu.be/ni6P5KU3SDU

Evolving Generative Adversarial Networks | Two Minute Papers #242
The paper "Evolutionary Generative Adversarial Networks" is available here: arxiv.org/abs/1803.00657 Our Patreon page: www.patreon.com/TwoMin...

snakers4 (Alexander), April 12, 08:08

DS Bowl 2018 stage 2 data was released.

It has completely different distribution from stage 1 data.

How do you like them, apples?

Looks like Kaggle admins really have no idea about dataset curation, or all of this is mean to misguide manual annotators.

Anyway - looks like random bs.

#data_science

#deep_learning

snakers4 (Alexander), April 10, 09:42

Yolov3 - best paper.

But not in terms of scientific contribution, but rebuttal of DS community BS.

Very funny read.

- pjreddie.com/media/files/papers/YOLOv3.pdf

If you want a proper comparison of object detection algorithms - use this paper arxiv.org/abs/1611.10012

Looks like SSD and YOLO are reasonably good and fast, and RCNN can be properly tuned to be 3-5x slower (not 100x) and more accurate.

#data_science

#computer_vision

Download YOLOv3.pdf 2.29 MB

snakers4 (Alexander), April 10, 09:01

A bit more on semantic segmentation, now 3D

{V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

}

--> Link / authors arxiv.org/abs/1606.04797, Fausto Milletari / Nassir Navab / Seyed-Ahmad Ahmadi

--> Essence:

(0) Essentially applies UNet to 3D with a custom DICE based loss

(1) Architecture - goo.gl/Yn2BGb - basically UNet with 3D convolutions. Upsampling / downsampling - goo.gl/VtXrXy

(2) PReLu (no ablation test)

(3) Receptive fields of layers - goo.gl/FGwDCF

(4) 3D DICE loss - goo.gl/SqrK93 (wo BCE?)

--> The paper does not use all the juice possible - hacky transfer learning (obvious idea - just stacking Imagenet filters), CLR, LinkNet architectures, etc

--> Looks like a good baseline / reference

{An application of cascaded 3D fully convolutional networks for medical image segmentation

}

--> arxiv.org/abs/1803.05431, a group of Japanese researchers

--> Essence:

(0) 2 stage 3D UNet, ablation test against 2D FCNs

(1) Loss - 3D cross-entropy

(2) Transfer learning - it works for other datasets, give a mild boost (1-3 %)

(3) 80-90% DICE, varies by organ

(4) weights downloadable github.com/holgerroth/3Dunet_abdomen_cascade (Caffe...)

--> Essentially a 2 stage process is dictated by memory considerations:

(0) Pipeline goo.gl/wZwF3X

In the long run transfer learning may rule, but here legal limitations may slow down this process.

#deep_learning

#medical_imaging

snakers4 (Alexander), April 09, 09:57

Forwarded from Petr Ivanov:

www.jetbrains.com/research/python-developers-survey-2017/

Python Developers Survey 2017 - Results

At the very end of 2017, the Python Software Foundation together with JetBrains conducted an official Python Developers Survey. We set out to identify the latest trends and gather insight into how the Python development world looks today. Over 9,500 developers from almost 150 different countries participated to help us map out an accurate landscape of the Python community.


snakers4 (Alexander), April 08, 13:53

For new (!) people on the channel:

- This channel is a practicioner's channel on the following topics: internet, data science, math, deep learning, philosophy

- Focus is on data science

- Don't get your opinion in a twist if your opinion differs. You are welcome to contact me via telegram @snakers41 and email - [email protected]

- No bs and ads

- Once a week or once several weeks we publish some ML related digests

Give us a rating:

- telegram.me/tchannelsbot?start=snakers4

Donations

- Buy me a coffee buymeacoff.ee/8oneCIN

- Direct donations - goo.gl/kvsovi - 5011673505 (paste this agreement number)

- Yandex - goo.gl/zveIOr

Other channel aliases (in case you are afraid Telegram gets blocked in Russia)

- Twitter feed - twitter.com/AlexanderVeysov

- Web feed snakers41.spark-in.me

Our website

- spark-in.me

Our chat

- goo.gl/IS6Kzz

DS courses review

- goo.gl/5VGU5A

- spark-in.me/post/learn-data-science

Our best article so far:

- spark-in.me/post/spacenet-three-challenge

Telegram Channels Bot

Discover the best channels 📢 available on Telegram. Explore charts, rate ⭐️ and enjoy updates! TChannels.me


snakers4 (Alexander), April 08, 13:48

As you may know (for newer people on the channel), sometimes we publish small articles on the website.

This time it covers a recent Power Laws challenge on DrivenData, which at first seemed legit and cool, but in the end turned back into a pumpkin.

Here is an article:

- spark-in.me/post/playing-with-electricity

#data_science

#time_series

#deep_learning

Playing with electricity - forecasting 5000 time series

In this article I share our experience participating in a recent time series challenge on Drivendata and my personal ideas about ML competitions Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), April 07, 11:52

Internet digest

- Ben Evans - mailchi.mp/ben-evans/benedicts-newsletter-no-450525?e=b7fff6bc1c

- About autonomous cars - www.ben-evans.com/benedictevans/2018/3/26/steps-to-autonomy - autonomy will vary based on the route / conditions / situation / use case

- FB delays its speaker - www.bloomberg.com/technology

- Foxconn buys Belking goo.gl/Xf6g9A

- Amazon music > 10m subs - goo.gl/C8Qhdm

- The Economist about ML in business - goo.gl/fTCHE9

- Apple to make its own chips - goo.gl/ZkkEVc

#internet

#digest

snakers4 (Alexander), April 07, 11:14

The trend for smaller / inadequate prizes and weird datasets continues:

- This years' DS Bowl on Kaggle features a small public train dataset (650 images) vs. much larger delayed validation dataset (3000 images). Cheap annotation, anyone? =) Also remarkably, for a somewhat difficult task (instance segmentation) - the prize is much lower than the last year;

- New autonomous driving contest on Kaggle, as well as other CVPR competitions - features extremely large datasets, extremely low prizes (US$1-2k), and no travel costs to CVPR covered. Ofc you can win and fly there, but this will not even cover your GPU costs;

- The recent xnView challenge I really wanted to participate - requires a US Tax number to be eligible for prizes. Of course they do not know about double taxation treaties and WEP-8 tax exemptions;

Alas =(

#deep_learning

snakers4 (Alexander), April 05, 20:04

youtu.be/AbxPbfODGcs

This Fools Your Vision | Two Minute Papers #241
The paper "Adversarial Examples that Fool both Human and Computer Vision" is available here: arxiv.org/abs/1802.08195 Our Patreon page with the detai...

snakers4 (Alexander), April 05, 06:19

For some reason when you use PyTorch multi-thread data loaders, it stalls if you use OpenCV and not set

cv2.setNumThreads(0)

Nice to know this.

#deep_learning

snakers4 (Alexander), April 03, 07:53

Handy opencv snippet to transform grayscale mask with labels (i.e. 0 for background, 1,2,3 etc) into a colourful map

y_pred_coloured = cv2.applyColorMap((y_pred / y_pred.max() * 255).astype('uint8'), cv2.COLORMAP_JET)

#deep_learning

snakers4 (Alexander), April 03, 05:27

snakers4 (Alexander), April 01, 07:57

Novel topic modelling techniques

- bigartm.readthedocs.io/en/stable/intro.html

- github.com/bigartm/bigartm

Looks interesting.

If anyone knows about this - please ping in PM.

#nlp