Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1322 members, 1482 posts since 2016

All this - lost like tears in rain.

Data science, deep learning, sometimes a bit of philosophy and math. No bs.

Our website
- spark-in.me
Our chat
- goo.gl/WRm93d
DS courses review
- goo.gl/5VGU5A
- goo.gl/YzVUKf

snakers4 (Alexander), May 15, 06:18

Internet / tech

- Google I/O news goo.gl/1FszFA

- MS to give custom voice option to its apps - goo.gl/5e2oMw

- Katzenberg (former Disney executive) raises US$800m to make YouTube like short series - goo.gl/wLpTGi => Internet + Video is a commodity now?

- _Reportedly_ Lyft has 35% market share in the USA goo.gl/tvzDTu

- Google becoming evil and doing military contracts - goo.gl/3HjYDg - wtf?

- Apple autonomous drive fleet is _repotedly_ now at 55 goo.gl/94tfkk

#internet

The 10 biggest announcements from Google I/O 2018

Here’s the most important news to know.


snakers4 (Alexander), May 15, 05:04

A great presentation about current state of particle tracking + ML

Also Kaggle failed to share this for some reason

indico.cern.ch/event/702054/attachments/1606643/2561698/tr180307_davidRousseau_CERN_trackML-FINAL.pdf

Key problem - current algorithm - Kalman filter faces time constaints

#data_science

snakers4 (Alexander), May 13, 12:25

How to show links?

anonymous poll

Full if short, hide if long – 28

👍👍👍👍👍👍👍 57%

Hide them behind markup – 11

👍👍👍 22%

Post full links – 9

👍👍 18%

Shorten them – 1

▫️ 2%

👥 49 people voted so far.

snakers4 (spark_comment_bot), May 13, 11:25

2018 DS/ML digest 10

Market

(0) Some moonshots by Google in working with electronic health records

(1) Google duplex - a narrow domain bot that makes calls for you

(2) Nature wants to make its ML journal ... paid

(3) Standford DawnBench - training Imagenet encoders as quickly and cheaply as possible

(4) Facebook achieves 85% on Imagenet by training on 1bn images in 336 GPUs in a week

(5) Learning the models of the surrounding world based on a DOOM like game

Practice / libraries / code

(0) A smarter and new way to ensemble CNNs

- Traditional approach - ensemble CNNS with different architecture - and just vote / average / apply linear regression on top

- Newer approach - use Cyclic Learning rate

- Even newer approach - model snapshot ensembling

- Stochastic Weight Averaging

-- store running average of the models

-- train one model with CLR

-- at the end of each lr update (or epoch) - do a running average of the models with some weights

-- the gist of the method is located on this line

-- I do understand why the update bnorm params, but I do not understand why it cannot be done just running 1 train epoch

- Papers on CNN ensembling 1 2 3

(1) (RU) Small amount of technocal details, but face-detection + face hashing works in retail (+human operator) given an HD camera

(2) (RU) Pose estimation

(3) Numpy autograd

"New" papers worth mentioning

(0) SqueezeNext

- Module comparsion

- Key changes

(i) more aggressive channel reduction by incorporating a two-stage squeeze module

(ii separable 3 × 3 convolutions

(iii) element-wise addition skip co

nection similar to ResNet

- Performance

(1) GANs to generate full-body anime characters in different poses

Visualizations:

(0) (does not work in Firefox) Visualizing encoder-decoder networks for translation

#data-science

#deep-learning

#digest

Like this post or have something to say => tell us more in the comments or donate!

Deep Learning for Electronic Health Records

Posted by Alvin Rajkomar MD, Research Scientist and Eyal Oren PhD, Product Manager, Google AI When patients get admitted to a hospital, th...


snakers4 (Alexander), May 13, 07:26

Presentation from a winning solution in DS Bowl 2018

goo.gl/LGNWL5

bowl.pdf


snakers4 (spark_comment_bot), May 12, 15:40

Google Duplex

In a nutshell - a combination of ML (RNN + speech recognition + WaveNet + Tacotron) that can call a human and pretend to be a human. Only works for narrow specific domains (call to a restaurant).

Links:

(0) Blog post

(1) MKBHD video

Usually I include such links into digests, but this time it looks like insanity:

(0) It looks heavily doctored, but believable

(1) All the components are kind of known to be at least 90-95% as good as presented

(2) Once again - it is very domain focused

One of the key research insights was to constrain Duplex to closed domains, which are narrow enough to explore extensively.

Duplex can only carry out natural conversations after being deeply trained in such domains.

It cannot carry out general conversations.

Like this post or have something to say => tell us more in the comments or donate!

Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone

Posted by Yaniv Leviathan, Principal Engineer and Yossi Matias, Vice President, Engineering, Google A long-standing goal of human-comput...


snakers4 (spark_comment_bot), May 12, 07:25

Running mean programming pattern in PyTorch

Sometimes you just need to apply exponential weighting:

(0) When tracking some metric

(1) When weighting a loss

(2) When applying something inspired by Adam

I used to do it in a quite an ugly way:

(0) Feed a list => calculate averages

(1) Do the same, but using a separate class

Found out from my colleague that it can be done in PyTorch using torch.nn.register_buffer

Very cool

Like this post or have something to say => tell us more in the comments or donate!

snakers4 (spark_comment_bot), May 11, 16:36

Testing comments on Spark-in.me

Some time ago we saw how some channels (like HackerNews) format their posts in Telegram:

(0) The post itself contains a button with a link to a comment section of their website

(1) The comment counter is updated mostly in line with real comment count on the website

(2) Sometimes there are additional buttons

This can be easily done using using third-party bots, but in our case given our custom-built blog, there was only one way:

(0) Create a post on the blog with an embedded Telegram snippet, hide it from all of the feeds

(1) Create a simple bot for posting messages (like this one)

(2) Create a simple bot connected to disqus API and updating the comment counter

(3) Profit!

For some topics the community opinion / knowledge / experience may be worth much more than a single post (like GANs or proxy setup).

So, if you like this feature - please tell use what you think in the comments!

Exploring the limits of unsupervised Machine Learning in Computer Vision

In this article I share my experience with GANs, progressive growing of GANs, image clustering and unsupervised learning Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), May 10, 18:43

youtu.be/DglrYx9F3UU

This AI Reproduces Human Perception | Two Minute Papers #248
Our Patreon page: www.patreon.com/TwoMinutePapers One-time payment links are available below. Thank you very much for your generous support! PayPal: ...

snakers4 (Alexander), May 10, 13:41

Forwarded from Hacker News:

Ubuntu 18.04: Unity is gone, Gnome is back, and Ubuntu has never been better (Score: 100+)

Link: j.mp/2It0yNf

Comments: j.mp/2IqKFqu

Ubuntu 18.04: Unity is gone, GNOME is back—and Ubuntu has never been better

Server users will really like 18.04, but the newest Ubuntu works great for all Linux fans.


snakers4 (Alexander), May 10, 12:03

Forwarded from Roem.ru:

Яндекс «исправил» невысокое разрешение «интернет-версиям» 7 классических фильмов о ВОВ. Формально плёнки можно просто заново классно оцифровать, но поисковик провёл другой эксперимент. Включил нейронную сеть и с её помощью добавил деталей и убрал технический брак существовавшим не ахти оцифровкам.

roem.ru/10-05-2018/270806/yandex-correcting-old-movies-18/

Нейронная сеть «Яндекса» подняла качество старому кино о Великой Отечественной войне

Искусственный интеллект помог сделать кадры детальнее, визуально более резким, а также компенсировал некоторые технологические недостатки оцифровки


snakers4 (Alexander), May 10, 05:39

Internet

Interesting links about Internet

(0) Ben Evans goo.gl/gvNBhS

Russia / CIS

(0) Telegram has a new proxy setting in alpha, though no proper stand-alone solutions are published

t.me/dvachannel/21784

(1) Western media now cover Telegram

goo.gl/nPJ4Sm

Global / tech

(0) Xiaomi to file for an IPO - US$10 - US$100bn

(1) Yet another drag and drop ML that will (m?) fail - lobe.ai/ - this is so American

(2) Now all "major" apps heavily feature "stories" as main mobile format - goo.gl/wbnHYD

Yet another reason to quit all social media and just use professional apps / messaging

Add up all this bs => this is the reason normal people do not use social media for real now

(3) Tesla most shorted tech company now - goo.gl/11yndY xD

Figures

(0) YouTube - 1.8bn users with 1+ login goo.gl/kyXFDH

(1) WhatsApp m70bn messages per day (vs. 20bn max with SMS) goo.gl/67DdVn

#internet

#digest

snakers4 (Alexander), May 09, 06:55

A couple of articles about the harsh reality of DS / ML jobs

In a nutshell:

- politics

- unjustified decisions

- same as everywhere

- www.kdnuggets.com/2018/04/why-data-scientists-leaving-jobs.html

- www.rdisorder.eu/2017/09/13/most-difficult-thing-data-science-politics/

True story, especially about political decisions, fight for power, useless dashboards and data monkeys.

#data_science

The most difficult thing in data science: politics

Deep learning looks difficult to you? Come back after you get to know company politics, it will feel like a breeze...


snakers4 (Alexander), May 07, 18:09

Fast.ai publishing​ a 2018 version of cutting edge course

www.fast.ai/2018/05/07/part2-launch/

Their materials are cool, but their library is questionable

#deep_learning

snakers4 (Alexander), May 06, 16:45

www.youtube.com/watch?v=gvjCu7zszbQ

press release

worldmodels.github.io/

This AI Learns From Its Dreams | Two Minute Papers #247
The paper "World Models" is available here: arxiv.org/abs/1803.10122 worldmodels.github.io/ Support the series and pick up cool perks on our ...

snakers4 (Alexander), May 05, 13:13

Andrew Ng book

Is being released on chapter-by-chapter basis

- mailchi.mp/4e8328c430be/machine-learning-yearning-chapters-1-1258557?e=fd8b9f9fc6

This book is not really technical, though - it's more or less a combination of advice how to build ML models as a business process

Interesting idea - splitting your dev set into black box and eyeball dev set ... but this can be replaced by properly using Tensorboard when training...

#data_science

snakers4 (Alexander), May 04, 07:45

A decent explanation about decorators in Python

book.pythontips.com/en/latest/decorators.html

#python

snakers4 (Alexander), May 04, 06:59

The current state of ML

goo.gl/rzKUiQ

(1) Do not call it AI

(2) Distinguish ML from Intelligent Infrastructure and Intelligence Augmentation

(3) Human-imitative AI is not tractable now

(4) Developments which are now being called "AI" arose mostly in the engineering fields associated with low-level pattern recognition and movement control

#deep_learning

Artificial Intelligence — The Revolution Hasn’t Happened Yet

Artificial Intelligence (AI) is the mantra of the current era. The phrase is intoned by technologists, academicians, journalists and…


snakers4 (Alexander), May 04, 05:47

Add comment button below major posts?

anonymous poll

Yes, definitely! – 26

👍👍👍👍👍👍👍 54%

Meh... – 14

👍👍👍👍 29%

No, why? – 7

👍👍 15%

Your option (PM / chat) – 1

▫️ 2%

👥 48 people voted so far.

snakers4 (Alexander), May 03, 02:38

youtu.be/kbOsDFtvYZk

How Computers Find Naked People in Photos
Why isn't the internet just covered in naked people? Algorithms! However, designing them to distinguish between pornography and people in skin tone clothing ...

snakers4 (Alexander), May 01, 16:52

2018 DS/ML digest 9

Market / libraries

(0) Tensorflow + Swift - wtf - goo.gl/FDvLM4

(1) Geektimes / Habrhabr.ru going international - goo.gl/dbGNwD

(2) A service for renting GPUs ... from people

- Reddit goo.gl/HxQ54x

- Link vectordash.com/hosting/

- Looks LXC based (afaik - the only user friendly alternative to Docker)

- Cool in theory, no idea how secure this is - we can assume as secure as providing a docker container to stranger

- They did not reply me in a week

(3) A friend sent me a new list of ... new yet another PyTorch NLP libraries

- goo.gl/kasRfZ, goo.gl/XXnbJy (AllenNLP is the biggest library like this)

- I believe that such libraries are more or less useless for real tasks, but cool to know they exist

(4) New SpaceNet 4? goo.gl/CsSS6P

(5) A new super cool competition on Kaggle about particle physics? www.kaggle.com/c/trackml-particle-identification

Tutorials / basics

(0) Bias vs. Variance (RU) goo.gl/4Y7tH7

(1) Yet another magic Jupyter guideline collection - goo.gl/AFWMuq

Real world ML applications

(0) Resnet + object detection (RU) - people wo helmets 90% accuracy - goo.gl/7xpQnE

(1) Fast.ai about using embeddings with Tabular data - www.fast.ai/2018/04/29/categorical-embeddings/

Very similar to our approach on electricity

I personally do not recommend using their library by all means

(2) Comparing Google TPU vs. V100 with ResNet50 - goo.gl/s6dhsy

- speed - goo.gl/Pww2sm

- pricing - goo.gl/Rtkp8Q

- but ... buying GPUs is much cheaper

(3) Other blog posts about embeddings + tabular data

- Sales prediction blog.kaggle.com/2016/01/22/rossmann-store-sales-winners-interview-3rd-place-cheng-gui/

- Taxi drive prediction blog.kaggle.com/2015/07/27/taxi-trajectory-winners-interview-1st-place-team-%F0%9F%9A%95/

MLP + classification + embeddings - goo.gl/AMNGNG / arxiv.org/pdf/1508.00021.pdf

(4) Albu's solution to SpaceNet - augmentations github.com/SpaceNetChallenge/RoadDetector/tree/master/albu-solution/src/augmentations

CNN overview

Neural network part:

Split data to 4 folds randomly but the same number of each city tiles in every fold

Use resnet34 as encoder and unet-like decoder (conv-relu-upsample-conv-relu) with skip connection from every layer of network. Loss function: 0.8*binary_cross_entropy + 0.2*(1 – dice_coeff). Optimizer – Adam with default params.

Train on image crops 512*512 with batch size 11 for 30 epoch (8 times more images in one epoch)

Train 20 epochs with lr 1e-4

Train 5 epochs with lr 2e-5

Train 5 epochs with lr 4e-6

Predict on full image with padding 22 on borders (1344*1344).

Merge folds by mean

Jobs / job market

(0) Developers by country by scraping GitHub - goo.gl/n8gnLi

- developers count vs. GDP prntscr.com/j9v80e R^2 = 84%

- developers count vs. population - R^2 = 50%

Visualization

(0) Interactive tool for visualizing convolutions - ezyang.github.io/convolution-visualizer/

Datasets

(0) Open Images v4 outsourced

- research.googleblog.com/2018/04/announcing-open-images-v4-and-eccv-2018.html

- the dataset itself storage.googleapis.com/openimages/web/download.html

- categories storage.googleapis.com/openimages/2018_04/bbox_labels_600_hierarchy_visualizer/circle.html

#data_science

#deep_learning

#digest

tensorflow/swift

swift - Swift for TensorFlow documentation repository.


snakers4 (Alexander), May 01, 07:53

Exploring GANs and unsupervised learning

Here are my findings from my hobby project about using GANs and unsupervised methods to build some decent semantic search on a large dataset of images without annotation:

(0) spark-in.me/post/unsupervised-learning-limits

Lots of cool images.

TLDR

(0) Features from pre-trained Imagenet encoder => PCA => Umap => HDBSCAN work really well for image clusterization;

(1) Any siamese network / hard negative mining inspired methods just did not work - the annotation data is too coarse;

(2) GANs kind of work, but I could not achieve the boasted photo-realistic levels;

#deep_learning

Exploring the limits of unsupervised Machine Learning in Computer Vision

In this article I share my experience with GANs, progressive growing of GANs, image clustering and unsupervised learning Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), May 01, 06:58

Showing more images in Tensorboard

TB is super cool (also in together with script gist.github.com/gyglim/1f8dfb1b5c82627ae3efcfbbadb9f514), but it shows ~10 images in its image preview.

This can be fixed.

(0) Find your TB folder

import tensorboard

tensorboard.__file__In my case it shows '/opt/conda/lib/python3.6/site-packages/tensorboard/__init__.py'

(1)

cd there

open backend/application.py

(2)

Change this line

image_metadata.PLUGIN_NAME: 400,(3)

Profit - now it shows ~400 images on each view tab

#deep_learning

Logging to tensorboard without tensorflow operations. Uses manually generated summaries instead of summary ops


snakers4 (Alexander), May 01, 06:51

Pinned post

What is this channel about?

(0) This channel is a practitioner's channel on the following topics: Internet, Data Science, Deep Learning, Python

(1) Don't get your opinion in a twist if your opinion differs. You are welcome to contact me via telegram @snakers41 and email - [email protected]

(2) No BS and ads

Donations

(0) Become a patreon 🤟 - www.patreon.com/bePatron?u=6159641

(1) Buy me a coffee 🤟 buymeacoff.ee/8oneCIN

Give us a rating:

(0) telegram.me/tchannelsbot?start=snakers4

Our chat

(0) t.me/joinchat/Bv9tjkH9JHaAEL-FVtw9Tw

More links

(0) Our website spark-in.me

(1) Our chat goo.gl/IS6Kzz

(2) DS courses review

goo.gl/5VGU5A

spark-in.me/post/learn-data-science

(3) GAN papers review

spark-in.me/post/gan-paper-review

(4) SpaceNet Challenge

spark-in.me/post/spacenet-three-challenge

(5) DS Bowl 2018

spark-in.me/post/playing-with-dwt-and-ds-bowl-2018

(6) Data Science tag on the website

spark-in.me/tag/data-science

Patron Checkout | Patreon

Patreon is empowering a new generation of creators. Support and engage with artists and creators as they live out their passions!


snakers4 (Alexander), May 01, 05:37

Playing with unsupervised learning in genetics

A small blog post on this topic

spark-in.me/post/playing-with-genetics

The first thing that springs to mind is RNN but what if there is no annotation and it is not known if the data consists of valid sequences?)

#data_science

Playing with genetic markers, clustering and visualization

Mesmerizing structires found in data: encoding, dimension reduction, clustering and visualization a dataset with genetic markers Статьи автора - http://spark-in.me/author/yara_tchk Блог - http://spark-in.me


snakers4 (Alexander), April 30, 05:37

A small saga about OpenVPN

TLDR:

(0) Purchase a cheap VDS from a noname provider with decent bandwidth => install OpenVPN => forget about problems => share with friends and family;

(1) This guide just works goo.gl/K2xjby (do not be afraid of its length - it is just verbose);

(2) I tested it with DigitalOcean and hostus.us;

From a financial standpoint US$1-5 per month per 3-5 users without any 3rd party services seems to be a bargain.

Hosting options:

(0) With DO it just works (just follow the guide step by step). But the cheapest VDS (which is overkill for this) costs US$5 per month. If you use my link - m.do.co/c/6f8e77dddc23 - you will get US$10 for free;

(1) Tested it with hostus.us. Follow my link, if you would like to support us - my.hostus.us/aff.php?aff=2169. A decent VPS can be found in Amsterdam for as cheap as US$5-8 for 3 months. Be careful - their UX is a bit misleading at times - (!!!) the country choice does not seem to flow from one menu to another (!!!). This seems to be more than enough - goo.gl/GyPZ6u;

(2) If you want to search yourself - go here - lowendstock.com/ - the best 2 options seem to be VirMach and hostus, but the former is sold out;

Host.us caveats:

(0) If you would like to follow the DO guide but use hostus, then for the cheapest options do not forget to enable this in the admin goo.gl/DRx3UX;

(1) VPS provisioning time there is 0-8 hours. In my case it was ~40 mins;

(2) I also faced this bug -goo.gl/BTqeTX;

What if I have a problem with ssh keys on windows?

(0) This will give you some basic info about managing Linux servers goo.gl/TgL61G;

(1) Here we explain how to use Putty and ssh keys on Windows goo.gl/xxvGBb (also just google it);

Why OpenVPN:

(0) Seems to be the most well-known open-source VPN software with easy accessible clients for all major platforms;

(1) I know people who used it;

Alternatives:

(0) github.com/trailofbits/algo - seems to be newer and cooler, but I do not know living people who reported actually using it;

#linux

#digital_freedom

Как настроить сервер OpenVPN в Ubuntu 16.04 | DigitalOcean

Хотите иметь безопасный и защищённый доступ в Интернет с вашего смартфона или ноутбука при подключении к незащищённой сети через WiFi отеля или кафе Виртуальная частная сеть (Virtual Private Network, VPN) позволяет...


snakers4 (Alexander), April 29, 18:32

Downgrading PyTorch from 0.4 to 0.3

Newest PyTorch has some issues with regards to multi-GPU operation.

If you want to install the previous version, the downgrade docs are a bit outdated, but you can simply:

conda install pytorch=0.3.0 cuda90 -c pytorch

#deep_learning

snakers4 (Alexander), April 29, 08:38

Forwarded from Админим с Буквой:

Docker pull via proxy

# systemctl edit docker.service

add the following strings:

[Service]

Environment=ALL_PROXY=socks5://user:[email protected]:port

reload systemd && restart docker

# systemctl daemon-reload

# systemctl restart docker.service

#proxy #docker

snakers4 (Alexander), April 28, 08:45

Using Mendeley to read papers

Looks like when you migrate to a new PC it also can migrate your literature library.

Nice.

#data_science

snakers4 (Alexander), April 27, 09:58

A handy snippet for `IOU` calculation

stackoverflow.com/questions/25349178/calculating-percentage-of-bounding-box-overlap-for-image-detector-evaluation

#deep_learning

Calculating percentage of Bounding box overlap, for image detector evaluation

In testing an object detection algorithm in large images, we check our detected bounding boxes against the coordinates given for the ground truth rectangles. According to the Pascal VOC challenges,


Widen Jupyter editor to 100% wide screen

Just apply this CSS

#texteditor-container {

width: 95%

}

#data_science