Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1182 members, 1226 posts since 2016

All this - lost like tears in rain.

Internet, data science, math, deep learning, philosophy. No bs.

Our website
- spark-in.me
Our chat
- goo.gl/IS6Kzz
DS courses review
- goo.gl/5VGU5A
- goo.gl/YzVUKf

snakers4 (Alexander), January 18, 13:19

Following our tweet-sender I had an idea.

Both Twitter and Telegram have APIs and python bindings.

So why not stream our telegram channel to Twitter? If you want to help us write a class for a $$ reward - please contact me.

snakers4 (Alexander), January 17, 09:25

Nice presentation to learn about Semantic Segmentation

slides.com/vladimiriglovikov/title-texttitle-text/fullscreen#/0/5

www.youtube.com/watch?v=MYp3OwkiJAs

#data_science

#deep_learning

В.Игловиков - о сегментации, Kaggle и вообще жизни

snakers4 (Alexander), January 17, 03:35

TF speech competition ended.

- www.kaggle.com/c/tensorflow-speech-recognition-challenge/leaderboard

In my opinion it was a very interesting domain, but on day one it was apparent that there is a public repo with 87% accuracy. So I guess 90% is a decent improvement, but judging by team sizes - it is just stacking. Also in such competitions there is no chance in winning money. Also also - this was just blatant TF marketing.

New competitions

- www.kaggle.com/c/data-science-bowl-2018

-- This year it sucks - small prizes, small data, will be just stacking 100 Unets =( Last year I was too unexperienced to participate =(

- goo.gl/qXPUoG - Intel Movidius competition. It also sucks - because you have to use only limited types of hardware and software. Basically this is a marketing campaign

#data_science

#competitions

TensorFlow Speech Recognition Challenge

Can you build an algorithm that understands simple speech commands?


snakers4 (Alexander), January 16, 09:13

Fast.ai about SF culture

- www.fast.ai/2018/01/08/startups/

Nice article about remote working

- hackernoon.com/the-stress-of-remote-working-38be5bdcf4da

The Stress of Remote Working

In software engineering, remote working makes a lot of sense since, most of the time, you only need a computer and an internet connection…


snakers4 (Alexander), January 16, 05:01

Next hobby project?

GANs for specific domain search problem – 19

👍👍👍👍👍👍👍 42%

Satellite imaging => roads => road graphs – 15

👍👍👍👍👍👍 33%

Something more community related - like making a pack of ML-themed stickers – 11

👍👍👍👍 24%

Your idea (message me)

▫️ 0%

👥 45 people voted so far.

snakers4 (Alexander), January 13, 19:17

youtu.be/YuIIjLr6vUA

Numberphile v. Math: the truth about 1+2+3+...=-1/12
Confused 1+2+3+…=-1/12 comments originating from that infamous Numberphile video keep flooding the comment sections of my and other math YouTubers videos. An...

snakers4 (Alexander), January 11, 10:56

A nice post of ML predictions for 2018

- blog.goodaudience.com/ai-in-2018-for-researchers-8955df0caaf9

#data_science

AI in 2018 for researchers

Ciao everyone! 2017 was one of the most productive and full of cool ideas and developments year in machine learning world. I think you…


snakers4 (Alexander), January 11, 09:30

A small hack for using multi-line python CLI commands via bash.

Just paste your long python command into script.sh

python3 train_satellites.py

--arch linknet34 --batch-size 16

--imsize 320 --preset mul_urban --augs True

--workers 6 --epochs 30 --start-epoch 0

--seed 42 --print-freq 50

--lr 1e-4 --optimizer adam

--tensorboard True --tensorboard_images True --lognumber testThen just:

sh script.sh

#data_science

snakers4 (Alexander), January 11, 05:49

Trick for image preprocessing - histogram equalization

- scikit-image.org/docs/dev/auto_examples/color_exposure/plot_equalize.html

#cv

snakers4 (Alexander), January 11, 02:30

youtu.be/_BPJFFkxSbw

Deep Image Prior | Two Minute Papers #219
The paper "Deep Image Prior" and its source code is available here: dmitryulyanov.github.io/deep_image_prior We would like to thank our generous Patr...

snakers4 (Alexander), January 10, 03:20

A 70% full GAN / style paper review:

- review spark-in.me/post/gan-paper-review

- TLDR - author.spark-in.me/gan-list.html

Did not crack math in Wasserstein GAN though.

Also a friend focused on GANS for ~6 months. Below is the gist of his work:

- GANs are known to be notoriously difficult and tricky to train even with wasserstein loss

- The most photo-realistic papers use custom regularization techniques and very sophisticated training regimes

- Seemingly photo-realistic GANs (with progressive growing)

-- are tricky to train

-- require 2-3x time to train the GAN itself and additional 3-6x to use growing

- end result may be completely unpredictable despite all the efforts

- most GANs are not viable in production / mobile applications

- visually in practice they perform much WORSE than style transfer

Training TLDR trick

- Use DCGAN just for training latent space variables w/o any domain

- Use CycleGan + wasserstein loss for domain transfer

- Use growing for photo-realism

As for using them for latent space algebra - I will do this project this year.

#deep_learning

#data_science

GAN paper list and review

In this I list useful / influential GAN papers and papers related to sparse unsupervised data CNN training / latent space operations Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


US$1 million prize US-citizen exclusive Kaggle challenge ... for just stacking Resnets?

- www.kaggle.com/c/passenger-screening-algorithm-challenge/discussion/45805

America is fucked up bad...

Also notice the shake-up and top scores

- Public goo.gl/2utoDC

- Private goo.gl/GXpnWe

#data_science

#sick_sad_worlds

Passenger Screening Algorithm Challenge

Improve the accuracy of the Department of Homeland Security's threat recognition algorithms


snakers4 (Alexander), January 09, 06:08

When I started doing CV - this page was quite scarce.

Now it's full and amazing!

I recommend this page as your go-to reference for already implemented non CNN based (classic) CV. It is just amazing. Simple and illustrative examples with code.

This totally eliminates the need in open-cv abomination =)

scikit-image.org/docs/dev/auto_examples/index.html

Best libraries for images I have seen so far

- pillow (pillow simd)

- skimage

- imageio

- scikit video

- moviepy

#data_science

#computer_vision

snakers4 (Alexander), January 09, 03:47

Research debt

- distill.pub/2017/research-debt/

A smart way of saying that 90% of everything is noise / bs =)

Also nice quote, i.e. making things easy is not popular

An aspiring research distiller lacks many things that are easy to take for granted: a career path, places to learn, examples and role models

#data_science

Research Debt

Science is a human activity. When we fail to distill and explain research, we accumulate a kind of debt...


snakers4 (Alexander), January 08, 18:13

ORLY?

Forwarded from Savva Kolbachev:

techcrunch.com/2018/01/08/telegram-open-network/

Telegram plans multi-billion dollar ICO for chat cryptocurrency

Encrypted messaging startup Telegram plans to launch its own blockchain platform and native cryptocurrency, powering payments on its chat app and beyond. According to multiple sources which have…


snakers4 (Alexander), January 08, 06:57

Some anti-hype predictions about dates of some achievable new applications of technology

- rodneybrooks.com/my-dated-predictions/

#internet

snakers4 (Alexander), January 08, 06:47

A 2017 ML/DS year in review by some venerable / random authors:

- Proper year review by WildML (!!!) - www.wildml.com/2017/12/ai-and-deep-learning-in-2017-a-year-in-review/

-- Includes a lot of links and proper materials

-- AlphaGo

-- Attention

-- RL and genetic algorithm renaissance

-- Pytorch - elephant in the room, TF and others

-- ONNX

-- Medicine

-- GANs

If I had to summarize 2017 in one sentence, it would be the year of frameworks. Facebook made a big splash with PyTorch. Due to its dynamic graph construction similar to what Chainer offers, PyTorch received much love from researchers in Natural Language Processing, who regularly have to deal with dynamic and recurrent structures that hard to declare in a static graph frameworks such as Tensorflow.

Tensorflow had quite a run in 2017. Tensorflow 1.0 with a stable and backwards-compatible API was released in February. Currently, Tensorflow is at version 1.4.1. In addition to the main framework, several Tensorflow companion libraries were released, including Tensorflow Fold for dynamic computation graphs, Tensorflow Transform for data input pipelines, and DeepMind’s higher-level Sonnet library. The Tensorflow team also announced a new eager execution mode which works similar to PyTorch’s dynamic computation graphs.

In addition to Google and Facebook, many other companies jumped on the Machine Learning framework bandwagon:

- Apple announced its CoreML mobile machine learning library.

- A team at Uber released Pyro, a Deep Probabilistic Programming Language.

- Amazon announced Gluon, a higher-level API available in MXNet.

- Uber released details about its internal Michelangelo Machine Learning infrastructure platform.

- And because the number of framework is getting out of hand, Facebook and Microsoft announced the ONNX open format to share deep learning models across frameworks. For example, you may train your model in one framework, but then serve it in production in another one.- In Russian - goo.gl/z1nLzq - kind of meh review (source - goo.gl/NUQ18C)

- Amazing 2017 article about global AI trends - srconstantin.wordpress.com/2017/01/28/performance-trends-in-ai/

- Uber engineering highlights - goo.gl/jBo91k

#digest

#deep_learning

#data_science

AI and Deep Learning in 2017 – A Year in Review

The year is coming to an end. I did not write nearly as much as I had planned to. But I’m hoping to change that next year, with more tutorials around Reinforcement Learning, Evolution, and Ba…


snakers4 (Alexander), January 08, 03:49

For new (!) people on the channel:

- This channel is a practicioners' channel on the following topics: internet, data science, math, deep learning, philosophy

- Focus is on data science

- Don't get your opinion in a twist if your opinion differs. You are welcome to contact me via telegram @snakers41 and email - aveysov@gmail.com

- No bs and ads

If you like our channel, please share it and give us a rating:

- telegram.me/tchannelsbot?start=snakers4

Buy us a coffee

- Direct donations - goo.gl/kvsovi - 5011673505 (paste this agreement number)

- Yandex - goo.gl/zveIOr

Our website

- spark-in.me

Our chat

- goo.gl/IS6Kzz

DS courses review

- goo.gl/5VGU5A

- spark-in.me/post/learn-data-science

GAN papers review

- spark-in.me/post/gan-paper-review

Telegram Channels Bot

Discover the best channels 📢 available on Telegram. Explore charts, rate ⭐️ and enjoy updates! TChannels.me


snakers4 (Alexander), January 07, 05:32

Some points about modern misconceptions about ML

pbs.twimg.com/media/DSz_TWKVwAI2V6s.jpg

#data_science

snakers4 (Alexander), January 07, 04:48

youtu.be/zjaz2mC1KhM

Distilling Neural Networks | Two Minute Papers #218
The paper "Distilling a Neural Network Into a Soft Decision Tree" is available here: arxiv.org/pdf/1711.09784.pdf Decision Trees and Boosting, XGBoos...

snakers4 (Alexander), January 06, 05:40

It's funny to find out about new black mirror episodes from Andrew Karpathy

t.co/PhJJLpBHkQ

Andrej Karpathy

Black Mirror Season 4: 1. "Hang the DJ" - fun twist 2. "USS Callister" - entertaining but unrealistic 3. "Arkangel" - good "well-intentioned tech gone wrong" story 4. "Black Museum" - trying a bit too hard 5. "Crocodile" - well that escalated quickly 6. "Metalhead" - yeeeeahno


snakers4 (Alexander), January 06, 05:11

snakers4 (Alexander), January 05, 05:13

Forwarded from Arseniy's channel:

gist.githubusercontent.com/voyeg3r/955636/raw/5ae6d373878b595568c4c2adedc4650828aa8d15/keylog2.pl

Рабочий пример кейлоггера под линукс способный работать без режима суперпользователя.

Forwarded from Arseniy's channel:

Итого, что бы скомпроментировать Linux систему и украсть wallet.dat достаточно запустить какой-то левый софт, который подгрузит perl/python скрипт поиска нужных файлов + стартанет в фоне кейлоггер отдельным процессом + пропишет его в автозагрузку, в 90% случаев пользователь ничего не заметит. Вирусов нет и не может быть, говорили они.

Повод серьезно задуматься о безопасности своих данных и о песочнице для левого софта 😱

snakers4 (Alexander), January 05, 04:51

Really cool articles about

- modern OCR

- how car tickets are issued (this is a whole industry!) - old algorithms gave 80%, newer ones give 95%

- also guys from Recognitor told me that CTC modelling really works really well (!)

Articles

- habrahabr.ru/company/recognitor/blog/343512/

- distill.pub/2017/ctc/

- hackernoon.com/latest-deep-learning-ocr-with-keras-and-supervisely-in-15-minutes-34aecd630ed8 (pure gold)

- www.youtube.com/watch?time_continue=131&v=uVbOckyUemo

If you will need to to modern OCR

- this articles are a great starter

- you can detect sth with YOLO, then rotate image with affine transformations, then use CTC

#data_science

#deepl_learning

Можно ли запихнуть распознавание номеров в любой тамагочи?

Про распознавание номеров мы рассказываем на Хабре давным давно. Надеюсь даже интересно. Похоже настало время рассказать как это применяется, зачем это вообще...


snakers4 (Alexander), January 05, 04:35

Decided to invest some time into Andrew Ng new courses in a view only mode to search for high level ideas (doing NNs from scratch became boring when we did it in Octave)

- First two or three courses are just plain old Octave course but in Python, which is great for beginners

- 4th course is about ML strategy. Watched it, it's very short

-- www.coursera.org/learn/machine-learning-projects/

-- goo.gl/zvt6RW - videos and presentations

-- key ideas

--- use human

--- treat time as the most precious commodity

--- divide metric into optimizing metrics and satisficing (i.e. good enough) metrics

--- always compare to plain humans / board of experts as a baseline to see if you need a bigger model / or less overfitting

--- always have train / validation (dev) / and delayed test sets

--- always think about practical implications

- 5th course is about modern ML (YOLO, sequence modelling) - worth checking out if you are not familiar. Not sure about assignments though

#data_science

#education

Structuring Machine Learning Projects | Coursera

Structuring Machine Learning Projects from ...


Ablation analysis is must - fchollet

- prntscr.com/hw9wvc

- goo.gl/PKXbpn

Screenshot

Captured with Lightshot


snakers4 (Alexander), January 04, 03:11

Twitter service works - it enables to read the important links while disregarding the junk / conversations / bs.

In my case all the emails are in one gmail folder and I can just unread / delete them all.

snakers4 (Alexander), January 04, 01:27

Starting my GAN paper review series ~40% in

- spark-in.me/post/gan-paper-review

Please comment / share / provide feedback.

#data_science

#deep_learning

GAN paper list and review

In this I list useful / influential GAN papers and papers related to sparse unsupervised data CNN training / latent space operations Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), January 04, 00:50

www.wildml.com/2017/12/ai-and-deep-learning-in-2017-a-year-in-review/

AI and Deep Learning in 2017 – A Year in Review

The year is coming to an end. I did not write nearly as much as I had planned to. But I’m hoping to change that next year, with more tutorials around Reinforcement Learning, Evolution, and Ba…


snakers4 (Alexander), January 03, 16:41

During the last competition my teammate found a nice paper in Jeremy Howard's tweet.

DS/ML/CV specialists in the USA like Twitter for some reason. In Russia / CIS Twitter is not used (at first vk.com was better and now telegram is better) and I have always considered it to be a service like Snapchat (i.e. useless hype generator) but with roots in SMS era (their stock and dwindling user base agree).

But this post - goo.gl/y3DXWH - changed my mind (twitter accounts of the brightest minds from NIPs).

So I decided to monitor their tweets ... and I guess twitter does not send you emails on every new tweet so that you would use their app. Notifications about new tweets are limited either to API or push notifications or SMS - which is hell (+1 garbage app on the phone - no thank you).

So today we decided to write and share a small python class that would use Twitter API to send you emails

- code github.com/nurtdinovadf/tweetsender

- how it looks in Gmail - prntscr.com/hvldt7

Please feel free to use it, share it, star it and comment. Many thanks.

#data_science

nurtdinovadf/tweetsender

tweetsender - Sending tweets of particular users to your email without following them


snakers4 (Alexander), January 03, 12:20

While reading GAN papers stumbled upon even creepier pix2pix cats

- raw.githubusercontent.com/junyanz/pytorch-CycleGAN-and-pix2pix/master/imgs/edges2cats.jpg

#deep_learning

And these are my best cats

Forwarded from Alexander:

Forwarded from Alexander:
Forwarded from Alexander:

snakers4 (Alexander), January 03, 09:38

I just realized the raw theoretic power of GANs - they enable you to create latent space features (just like word2vec) for any domain w/o annotation (!).

What a great time to be alive.