Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1326 members, 1561 posts since 2016

All this - lost like tears in rain.

Data science, deep learning, sometimes a bit of philosophy and math. No bs.

Our website
- spark-in.me
Our chat
- goo.gl/WRm93d
DS courses review
- goo.gl/5VGU5A
- goo.gl/YzVUKf

snakers4 (Alexander), August 17, 11:45

Found all Ipython's rich display capabilities in one place

nbviewer.jupyter.org/github/ipython/ipython/blob/1.x/examples/notebooks/Part%205%20-%20Rich%20Display%20System.ipynb#Links-to-local-files

Notebook on nbviewer

Check out this Jupyter notebook!


snakers4 (Alexander), August 16, 06:00

Google updates its transformer

ai.googleblog.com/2018/08/moving-beyond-translation-with.html

#nlp

Moving Beyond Translation with the Universal Transformer

Posted by Stephan Gouws, Research Scientist, Google Brain Team and Mostafa Dehghani, University of Amsterdam PhD student and Google Research...


snakers4 (Alexander), August 15, 02:18

youtu.be/LBezOcnNJ68

NVIDIA's AI Makes Amazing Slow-Mo Videos
The paper "Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation" is available here: people.cs.umass.edu/~hzji...

snakers4 (Alexander), August 13, 11:04

Yet another crowd GPU rent service?

vast.ai/console/create/

Create Instance | Vast.ai Console

Search available instances, configure launch settings, create instances


snakers4 (Alexander), August 13, 05:24

Float16 / half training in PyTorch

Tried to do it in the most obvious way + some hacks from here

discuss.pytorch.org/t/resnet18-throws-exception-on-conversion-to-half-floats/6696

Did anybody do it successfully on real models?

#deep_learning

Resnet18 throws exception on conversion to half floats

Hey, I have tried to launch the following code: from torchvision import models resnet = models.resnet18(pretrained=True).cpu() resnet.half() and have got an exception: libc++abi.dylib: terminating with uncaught exception of type std::invalid_argument: Unsupported tensor type Sounds like the halftensor type is not registered properly. But not sure why it’s the case. Using pytorch 0.2.0 py36_1cu75 soumith Any advice how to fix it?


snakers4 (Alexander), August 13, 04:34

Untar all the archives in the folder, deleting them

find . -name '*.tar' -execdir tar -xvf '{}' ; -execdir rm '{}' ;

#linux

snakers4 (Alexander), August 12, 11:15

2018 DS/ML digest 20

spark-in.me/post/2018_ds_ml_digest_20

#deep_learning

#digest

#data_science

2018 DS/ML digest 20

2018 DS/ML digest 20 Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), August 12, 06:32

A small post on faster distance calculation

- Post spark-in.me/post/cosine-distance-hundred-times-faster

- Medium medium.com/@aveysov/speeding-up-word-distance-calculation-100x-96ee4f5007cd

Please support, if you like the posts!

Speeding up word distance calculation 100x

My plain approach to faster and more practical cosine distance calculation Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), August 12, 05:21

New publication format

Decided to try a new, more streamlined, fast and automated approach to publishing a bit longer posts.

(0) Write a note in md format

(1) Transform to HTML automatically => post on spark-in.me

(2) Repost to medium via automated import

(3) Repost to Reddit / Habr.com (if they start accepting English articles) via md

...

(4) Profit - 4 publications at the cost and time of one)

Decided to start with porting 2 latest articles to medium

- medium.com/@aveysov/playing-with-crowd-ai-mapping-challenge-or-how-to-improve-your-cnn-performance-with-109684f95dcd

- medium.com/@aveysov/solving-class-imbalance-on-google-open-images-cf9e890bb146

Please tell me what you think in the comments!

Also md can be transformed almost to any format using pandoc)

#data_science

Playing with Crowd-AI mapping challenge — or how to improve your CNN performance with self-supervised techniques

Originally published at spark-in.me on July 15, 2018.


snakers4 (Alexander), August 12, 05:01

Pre-trained ShuffleNet on PyTorch, if anybody needs

Forwarded from Just links:

github.com/Randl/ShuffleNetV2-pytorch

Randl/ShuffleNetV2-pytorch

ShuffleNetV2-pytorch - Implementation of ShuffleNetV2 for pytorch


snakers4 (Alexander), August 10, 11:39

Using numba

Looks like ... it just works when it works.

For example this cosine distance calculation function works ca 10x faster.

@numba.jit(target='cpu', nopython=True)

def fast_cosine(u, v):

m = u.shape[0]

udotv = 0

u_norm = 0

v_norm = 0

for i in range(m):

if (np.isnan(u[i])) or (np.isnan(v[i])):

continue

udotv += u[i] * v[i]

u_norm += u[i] * u[i]

v_norm += v[i] * v[i]

u_norm = np.sqrt(u_norm)

v_norm = np.sqrt(v_norm)

if (u_norm == 0) or (v_norm == 0):

ratio = 1.0

else:

ratio = udotv / (u_norm * v_norm)

return 1-ratioAlso looks like they recently were supported by NumFocus

numfocus.org/sponsored-projects

#nlp

Sponsored Projects | pandas, NumPy, Matplotlib, Jupyter, + more - NumFOCUS

Explore NumFOCUS Sponsored Projects, including: pandas, NumPy, Matplotlib, Jupyter, rOpenSci, Julia, Bokeh, PyMC3, Stan, nteract, SymPy, FEniCS, PyTables...


snakers4 (Alexander), August 09, 15:10

Thanks for everybody who used our DO / host.us affiliate links!

They finally started to vest)

Affiliate links:

m.do.co/c/6f8e77dddc23

my.hostus.us/aff.php?aff=2169

There were a couple of simplistic guides:

- Socks5 spark-in.me/post/vds-socks5-proxy-server

- OpenVPN snakers41.spark-in.me/1945

#linux

DigitalOcean: Cloud Computing, Simplicity at Scale

Providing developers and businesses a reliable, easy-to-use cloud computing platform of virtual servers (Droplets), object storage ( Spaces), and more.


Amazing image examples from open images dataset

Forwarded from Just links:
Forwarded from Just links:

snakers4 (Alexander), August 08, 02:57

New android

android-developers.googleblog.com/2018/08/introducing-android-9-pie.html?m=1

Neural Networks API 1.1

Android 9 adds an updated version of the Neural networks API, to extend Android's support for accelerated on-device machine learning. Neural Networks 1.1 adds support for nine new ops -- Pad, BatchToSpaceND, SpaceToBatchND, Transpose, Strided Slice, Mean, Div, Sub, and Squeeze. A typical way to take advantage of the APIs is through TensorFlow Lite.

Introducing Android 9 Pie

After more than a year of development and months of testing by early adopters, we're ready to launch Android 9 Pie, the latest release of Android, to the world. Android 9 harnesses the power of machine learning to make your phone smarter, simpler, and tailored to you. Read all about the new consumer features here. For developers, Android 9 includes many new ways to enhance your apps and build new experiences to drive engagement.


snakers4 (Alexander), August 07, 05:51

Proper commits / using git

If you follow this guide and do proper commits

chris.beams.io/posts/git-commit/

Then you will get this on GitHub as a reward

pics.spark-in.me/upload/53024bedeba936d61dd8e682707d881d.jpg

[Insert comment here about Microsoft's acquisition of Github]

#coding

snakers4 (Alexander), August 07, 04:04

UMAP

github.com/lmcinnes/umap

Wrote a couple of posts about UMAP before.

Since last time, they extended their docs and published a paper:

- How it works umap-learn.readthedocs.io/en/latest/how_umap_works.html (topology) - I kind of understand 50% of this

- Paper arxiv.org/abs/1802.03426 (have not read yet)

What I really like about UMAP author - he answers questions on the forums / invested a lot of time into explaining how UMAP and HDBSCAN work / built stellar docs and is overall a nice guy.

What I really like in practice - this combination works really well:

- PCA => UMAP => HDBSCAN

#data_science

lmcinnes/umap

umap - Uniform Manifold Approximation and Projection


snakers4 (Alexander), August 07, 01:27

youtu.be/MvFABFWPBrw

DeepMind Has A Superhuman Level Quake 3 AI Team
Pick up cool perks on our Patreon page: www.patreon.com/TwoMinutePapers The paper "Human-level performance in first-person multiplayer games with pop...

snakers4 (Alexander), August 06, 10:44

NLP - naive preprocessing

A friend has sent me a couple of gists

- gist.github.com/thinline72/e35e1aaa09bd5519b7f07663152778e7

- gist.github.com/thinline72/29d3976e434572ef3ee68ab7a473b400

Useful boilerplate

#nlp

quora_vecs_l2_test.ipynb

GitHub is where people build software. More than 28 million people use GitHub to discover, fork, and contribute to over 85 million projects.


snakers4 (Alexander), August 06, 05:16

snakers4 (Alexander), August 03, 02:23

snakers4 (Alexander), August 01, 18:05

Finally found a decent python module building guide

chrisyeh96.github.io/2017/08/08/definitive-guide-python-imports.html

#python

The Definitive Guide to Python import Statements | Chris Yeh

Stanford University, Class of 2018


snakers4 (Alexander), July 31, 18:33

Autofocus for semseg?

arxiv.org/abs/1805.08403

I have not seen people for whom DeepLab worked...and in my tests dilated convolutions were the same...though some claim they help with high-res images with small objects...

Ideas:

(0) Autofocus layer, a novel module that enhances the multi-scale processing of CNNs by learning to select the ‘appropriate’ scale for identifying different objects in an image

(1) Layer description

pics.spark-in.me/upload/2f562fb9d12d76c36fa8777713de9716.jpg

(2) Implementation github.com/yaq007/Autofocus-Layer/blob/master/models.py

I believe this will work best for 3D images

#deep_learning

snakers4 (Alexander), July 31, 05:53

Yet another python tricks book

dbader.org/

www.getdrip.com/deliveries/xugaymstfzmizbyposdk?__s=ejdgfo9tsdhpgcrcscs3

vk.com/doc7608079_466151365

#python

Python Training by Dan Bader – dbader.org

Dan Bader helps Python developers become more awesome. His tutorials, videos, and trainings have reached over half a million developers around the world.


snakers4 (Alexander), July 31, 05:47

2018 DS/ML digest 19

Market / data / libraries

(0) 32k lesions image dataset open-sourced

- goo.gl/CUQwnv

- nihcc.app.box.com/v/DeepLesion

(1) A new Distill article about Differentiable Image Parameterizations

- Usually images are parametrized as RGB values (normalized)

- Idea - use different (learnable) parametrization

- distill.pub/2018/differentiable-parameterizations/

- Parametrizing resulting image with fourier transform enables to use different architectures with style transfer distill.pub/2018/differentiable-parameterizations/#figure-style-transfer-diagram

- Working with transparent images

(2) Lip reading with 40% Word Error Rate arxiv.org/pdf/1807.05162.pdf

(3) Joing auto architecture + hyper param search arxiv.org/pdf/1807.06906.pdf (*)

(4) rl-navigation.github.io/deployable/

(5) New CNN architectures from ICML www.facebook.com/icml.imls/videos/429607650887089/%20 (*)

(6) Jupiter notebook widget for text annotaion github.com/natasha/ipyannotate

(7) A bit more debunking of auto-ml by fast.ai www.fast.ai/2018/07/23/auto-ml-3/

(8) A small intro to Bayes methods alexanderdyakonov.wordpress.com/2018/07/30/%d0%b1%d0%b0%d0%b9%d0%b5%d1%81%d0%be%d0%b2%d1%81%d0%ba%d0%b8%d0%b9-%d0%bf%d0%be%d0%b4%d1%85%d0%be%d0%b4/

(9) Criminal face recognition 20% false positives - www.nytimes.com/2018/07/26/technology/amazon-aclu-facial-recognition-congress.html?

(10) Denoising images wo noiseless ground-truth news.developer.nvidia.com/ai-can-now-fix-your-grainy-photos-by-only-looking-at-grainy-photos/?ncid=--45511

NLP

(0) Autoencoders for text habr.com/company/antiplagiat/blog/418173/ - no clear conclusion?

(1) RNN use cases overview indico.cern.ch/event/722319/contributions/3001310/attachments/1661268/2661638/IML-Sequence.pdf

(2) ACL 2018 notes ruder.io/acl-2018-highlights/

Hardware

(0) Edge embeddable TPU devices aiyprojects.withgoogle.com/edge-tpu ?

(1) GeForce 11* finally coming soon? Prices for 1080Ti are falling now...

#digest

#deep_learning

NIH Clinical Center releases dataset of 32,000 CT images

Lesion data may make it easier for scientific community to identify tumor growth or new disease


snakers4 (Alexander), July 31, 05:18

Some interesting NLP related ideas from ACL 2018

ruder.io/acl-2018-highlights/

Overall

- bag-of-embeddings is surprisingly good at capturing sentence-level properties, among other results

- language models are bad at modelling numerals and propose several strategies to improve them

- current state-of-the-art models fail to capture many simple inferences

- LSTM representations, even though they have been trained on one task, are not task-specific. They are often predictive of unintended aspects such as demographics in the data

- Word embedding-based methods exhibit competitive or even superior performance

Four common ways to introduce linguistic information into models:

- Via a pipeline-based approach, where linguistic categories are used as features;

- Via data augmentation, where the data is augmented with linguistic categories;

- Via multi-task learning;

#nlp

ACL 2018 Highlights: Understanding Representations

This post reviews two themes of ACL 2018: 1) gaining a better understanding what models capture and 2) to expose them to more challenging settings.


snakers4 (Alexander), July 31, 04:42

Airbus ship detection challenge

On a surface this looks like a challenging and interesting competition:

- www.kaggle.com/c/airbus-ship-detection

- Train / test sets - 14G / 12G

- Downside - Kaggle and very fragile metric

- Upside - a separate significant price for fast algorithms!

- 768x768 images seem reasonable

#deep_learning

#data_science

Airbus Ship Detection Challenge

Find ships on satellite images as quickly as possible


snakers4 (Alexander), July 30, 05:48

The reality of human face recognition

There is a lot of hype related to the surveillance state / 1984 / Chinese offline cameras.

Cannot help but feature this amazing article from Russian engineers (RU):

habr.com/company/recognitor/blog/418127/

#deep_learning

Правда и ложь систем распознавания лиц

Пожалуй нет ни одной другой технологии сегодня, вокруг которой было бы столько мифов, лжи и некомпетентности. Врут журналисты, рассказывающие о технологии, врут...


snakers4 (Alexander), July 29, 16:50

youtu.be/8GUYAVXmhsI

DeepMind's AI Learns The Piano From The Masters of The Past
The paper "The challenge of realistic music generation: modelling raw audio at scale" is available here: arxiv.org/abs/1806.10474 drive.googl...

snakers4 (Alexander), July 28, 05:13

New Keras version

github.com/keras-team/keras/releases/tag/2.2.1

No real major changes...

#deep_learning

keras-team/keras

keras - Deep Learning for humans


snakers4 (Alexander), July 27, 03:17

The truth about ML courses

cv-blog.ru/?p=238

#deep_learning

snakers4 (Alexander), July 25, 15:13