Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1317 members, 1587 posts since 2016

All this - lost like tears in rain.

Data science, deep learning, sometimes a bit of philosophy and math. No bs.

Our website
- spark-in.me
Our chat
- goo.gl/WRm93d
DS courses review
- goo.gl/5VGU5A
- goo.gl/YzVUKf

Posts by tag «data_science»:

snakers4 (Alexander), September 24, 12:53

(RU) most popular ML algorithms explained in simple terms

vas3k.ru/blog/machine_learning/

#data_science

Машинное обучение для людей

Разбираемся простыми словами


snakers4 (Alexander), September 20, 16:06

DS/ML digest 24

Key topics of this one:

- New method to calculate phrase/n-gram/sentence embeddings for rare and OOV words;

- So many releases from Google;

spark-in.me/post/2018_ds_ml_digest_24

If you like our digests, you can support the channel via:

- Sharing / reposting;

- Giving an article a decent comment / a thumbs-up;

- Buying me a coffee (links on the digest);

#digest

#deep_learning

#data_science

2018 DS/ML digest 24

2018 DS/ML digest 24 Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), September 06, 05:48

DS/ML digest 23

The key topic of this one - is this is insanity

- vid2vid

- unsupervised NMT

spark-in.me/post/2018_ds_ml_digest_23

If you like our digests, you can support the channel via:

- Sharing / reposting;

- Giving an article a decent comment / a thumbs-up;

- Buying me a coffee (links on the digest);

Let's spread the right DS/ML ideas together.

#digest

#deep_learning

#data_science

2018 DS/ML digest 23

2018 DS/ML digest 23 Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), September 05, 06:40

MySQL - replacing window functions

Older versions of MySQL (and maybe newer ones) do not have all the goodness you can find in PostgreSQL. Ofc you can do plain session matching in Python, but sometimes you just need to do it in plain SQL.

In Postgres you usually use window functions for this purpose if you need PLAIN SQL (ofc there are stored procedures / views / mat views etc).

In MySQL it can be elegantly solved like this:

SET @session_number = 0, @last_uid = '0', @current_id = '0', @dif=0;

SELECT

t1.some_field,

t2.some_field,

...

@last_uid:[email protected]_uid,

@current_uid:=t1.uid,

@dif:=TIMESTAMPDIFF(MINUTE, t2.session_ts, t1.session_ts),

if(@[email protected]_uid, if(@dif > 30,@session_number:[email protected]_number+1,@session_number),@session_number:=0) as session

FROM

table1 t1

JOIN table2 t2 on t1.id = t2.id+1

#data_science

snakers4 (Alexander), August 31, 13:59

DS/ML digest 22

spark-in.me/post/2018_ds_ml_digest_22

#digest

#deep_learning

#data_science

2018 DS/ML digest 22

2018 DS/ML digest 22 Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), August 29, 05:45

Venn diagrams in python

Compare sets as easy as:

# Import the library

import matplotlib.pyplot as plt

from matplotlib_venn import venn3

# Make the diagram

plt.figure(figsize=(10,10))

venn3(subsets = (s1,s2,s3),set_labels=['synonyms','our_tree','add_syns'])

plt.show()

Very simple and useful.

#data_science

snakers4 (Alexander), August 28, 14:42

Garbage collection and encapsulation in python

Usually this is not an issue.

But when doing a batch-job / loop / something with 100k+ objects it suddenly becomes an issue.

Also it does not help that the environment you are testing your code and the actual running environments are different.

The solution - simply encapsulate everything you can. Garbage is automatically collected.

#data_science

A case against Kaggle

If you thought that Kaggle is the home of Data Science - think again.

www.kaggle.com/c/airbus-ship-detection/discussion/64355

This is official - they do not know the hell they are doing.

There have been several appalling cases already, but this takes the prize.

Following this thread, wrote a small petition to Kaggle

www.kaggle.com/c/airbus-ship-detection/discussion/64393

I doubt that they will hear, but why not.

#data_science

snakers4 (Alexander), August 24, 07:04

Played with FAISS

Played with FAISS on the GPU a bit. Their docs also cover simplistic cases really well, but more sophisticated cases are better inferred from examples, because their Python docs are not really intended for heavy use.

Anyway I manged to build a KNN graph with FAISS on on GPU for 10m points in 2-3 hours.

It does the following:

- KNN graph;

- PCA, K-Means;

- Queries;

- VERY sophisticated indexing with many option;

It supports:

- GPU;

- Multi-GPU (I had to use env. variables to limit GPU list, because there is no option for python);

Also their docs are awsome for such a low-level project.

github.com/facebookresearch/faiss/wiki/Faiss-building-blocks:-clustering,-PCA,-quantization

#data_science

#similarity

facebookresearch/faiss

A library for efficient similarity search and clustering of dense vectors. - facebookresearch/faiss


Playing with Atom and Hydrogen

TLDR - it delivers to turn the Atom editor into something like interactive notebook, but its auto-completion options are very scarce. Also you can connect to your running ipython kernel as well as docker container.

nteract.gitbooks.io/hydrogen/docs/Installation.html

blog.nteract.io/hydrogen-interactive-computing-in-atom-89d291bcc4dd

But it is no match to either a normal notebook or a python IDE.

#data_science

snakers4 (Alexander), August 12, 11:15

2018 DS/ML digest 20

spark-in.me/post/2018_ds_ml_digest_20

#deep_learning

#digest

#data_science

2018 DS/ML digest 20

2018 DS/ML digest 20 Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), August 12, 05:21

New publication format

Decided to try a new, more streamlined, fast and automated approach to publishing a bit longer posts.

(0) Write a note in md format

(1) Transform to HTML automatically => post on spark-in.me

(2) Repost to medium via automated import

(3) Repost to Reddit / Habr.com (if they start accepting English articles) via md

...

(4) Profit - 4 publications at the cost and time of one)

Decided to start with porting 2 latest articles to medium

- medium.com/@aveysov/playing-with-crowd-ai-mapping-challenge-or-how-to-improve-your-cnn-performance-with-109684f95dcd

- medium.com/@aveysov/solving-class-imbalance-on-google-open-images-cf9e890bb146

Please tell me what you think in the comments!

Also md can be transformed almost to any format using pandoc)

#data_science

Playing with Crowd-AI mapping challenge — or how to improve your CNN performance with self-supervised techniques

Originally published at spark-in.me on July 15, 2018.


snakers4 (Alexander), August 07, 04:04

UMAP

github.com/lmcinnes/umap

Wrote a couple of posts about UMAP before.

Since last time, they extended their docs and published a paper:

- How it works umap-learn.readthedocs.io/en/latest/how_umap_works.html (topology) - I kind of understand 50% of this

- Paper arxiv.org/abs/1802.03426 (have not read yet)

What I really like about UMAP author - he answers questions on the forums / invested a lot of time into explaining how UMAP and HDBSCAN work / built stellar docs and is overall a nice guy.

What I really like in practice - this combination works really well:

- PCA => UMAP => HDBSCAN

#data_science

lmcinnes/umap

Uniform Manifold Approximation and Projection. Contribute to lmcinnes/umap development by creating an account on GitHub.


snakers4 (Alexander), July 31, 04:42

Airbus ship detection challenge

On a surface this looks like a challenging and interesting competition:

- www.kaggle.com/c/airbus-ship-detection

- Train / test sets - 14G / 12G

- Downside - Kaggle and very fragile metric

- Upside - a separate significant price for fast algorithms!

- 768x768 images seem reasonable

#deep_learning

#data_science

Airbus Ship Detection Challenge

Find ships on satellite images as quickly as possible


snakers4 (Alexander), July 23, 06:26

My post on open images stage 1

For posterity

Please comment

spark-in.me/post/playing-with-google-open-images

#deep_learning

#data_science

Solving class imbalance on Google open images

In this article I propose an appoach to solve a severe class imbalance on Google open images Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), July 23, 05:15

2018 DS/ML digest 18

Highlights of the week

(0) RL flaws

thegradient.pub/why-rl-is-flawed/

thegradient.pub/how-to-fix-rl/

(1) An intro to AUTO-ML

www.fast.ai/2018/07/16/auto-ml2/

(2) Overview of advances in ML in last 12 months

www.stateof.ai/

Market / applied stuff / papers

(0) New Nvidia Jetson released

www.phoronix.com/scan.php?page=news_item&px=NVIDIA-Jetson-Xavier-Dev-Kit

(1) Medical CV project in Russia - 90% is data gathering

cv-blog.ru/?p=217

(2) Differentiable architecture search

arxiv.org/pdf/1806.09055.pdf

-- 1800 GPU days of reinforcement learning (RL) (Zoph et al., 2017)

-- 3150 GPU days of evolution (Real et al., 2018)

-- 4 GPU days to achieve SOTA in CIFAR => transferrable to Imagenet with 26.9% top-1 error

(3) Some basic thoughts about hyper-param tuning

engineering.taboola.com/hitchhikers-guide-hyperparameter-tuning/

(4) FB extending fact checking to mark similar articles

www.poynter.org/news/rome-facebook-announces-new-strategies-combat-misinformation

(5) Architecture behind Alexa choosing skills goo.gl/dWmXZf

- Char-level RNN + Word-level RNN

- Shared encoder, but attention is personalized

(6) An overview of contemporary NLP techniques

medium.com/@ageitgey/natural-language-processing-is-fun-9a0bff37854e

(7) RNNs in particle physics?

indico.cern.ch/event/722319/contributions/3001310/attachments/1661268/2661638/IML-Sequence.pdf?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=NLP%20News

(8) Google cloud provides PyTorch images

twitter.com/i/web/status/1016515749517582338

NLP

(0) Use embeddings for positions - no brainer

twitter.com/i/web/status/1018789622103633921

(1) Chatbots were a hype train - lol

medium.com/swlh/chatbots-were-the-next-big-thing-what-happened-5fc49dd6fa61

The vast majority of bots are built using decision-tree logic, where the bot’s canned response relies on spotting specific keywords in the user input.Interesting links

(0) Reasons to use OpenStreetMap

www.openstreetmap.org/user/jbelien/diary/44356

(1) Google deployes its internet ballons

goo.gl/d5cv6U

(2) Amazing problem solving

nevalalee.wordpress.com/2015/11/27/the-hotel-bathroom-puzzle/

(3) Nice flame thread about CS / ML is not science / just engineering etc

twitter.com/RandomlyWalking/status/1017899452378550273

#deep_learning

#data_science

#digest

RL’s foundational flaw

RL as classically formulated has lately accomplished many things - but that formulation is unlikely to tackle problems beyond games. Read on to see why!


snakers4 (Alexander), July 21, 11:02

Found an amazing explanation about Python's super here

stackoverflow.com/a/27134600

Understanding Python super() with __init__() methods

I'm trying to understand the use of super(). From the looks of it, both child classes can be created, just fine. I'm curious to know about the actual difference between the following 2 child clas...


Playing with focal loss for multi-class classification

Playing with this Loss

gist.github.com/snakers4/5739ade67e54230aba9bd8a468a3b7be

If anyone has a better option - please PM me / or comment in the gist.

#deep_learning

#data_science

Multi class classification focal loss

Multi class classification focal loss . GitHub Gist: instantly share code, notes, and snippets.


snakers4 (Alexander), July 17, 08:32

snakers4 (Alexander), July 15, 08:54

Sometimes in supervised ML tasks leveraging the data sctructure in a self-supervised fashion really helps!

Playing with CrowdAI mapping competition

In my opinion it is a good test-ground for testing your ideas with SemSeg - as the dataset is really clean and balanced

spark-in.me/post/a-small-case-for-search-of-structure-within-your-data

#deep_learning

#data_science

#satellite_imaging

Playing with Crowd-AI mapping challenge - or how to improve your CNN performance with self-supervised techniques

In this article I tell about a couple of neat optimizations / tricks / useful ideas that can be applied to many SemSeg / ML tasks Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (spark_comment_bot), July 15, 06:16

Feeding images / tensors of different size using PyTorch dataloader classes

Struggled to do this properly on DS Bowl (I resorted to random crops there for training and 1-image sized batches for validation).

Suppose your dataset has some internal structure in it.

For example - you may have images of vastly different aspect ratios (3x1, 1x3 and 1x1) and you would like to squeeze every bit of performance from your pipeline.

Of course, you may pad your images / center-crop them / random crop them - but in this case you will lose some of the information.

I played with this on some tasks - sometimes force-resize works better than crops, but trying to apply your model convolutionally worked really good on SemSeg challenges.

So it may work very well on plain classification as well.

So, if you apply your model convolutionally, you will end up with differently-sized feature maps for each cluster of images.

Within the model, it can be fixed with:

(0) Adaptive avg pooling layers

(1) Some simple logic in .forward statement of the model

But anyway you end up with a small technical issue - PyTorch cannot concatenate tensors of different sizes using standard collation function.

Theoretically, there are several ways to fix this:

(0) Stupid solution - create N datasets, train on them sequentially.

In practice I tried that on DS Bowl - it worked poorly - the model overfitted to each cluster, and then performed poorly on next one;

(1) Crop / pad / resize images (suppose you deliberately want to avoid that);

(2) Insert some custom logic into PyTorch collattion function, i.e. resize there;

(3) Just sample images so that only images of one size end up within each batch;

(0) and (1) I would like to avoid intentionally.

(2) seems a bit stupid as well, because resizing should be done as a pre-processing step (collation function deals with normalized tensors, not images) and it is better not to mix purposes of your modules

Ofc, you can try to produce N tensors in (2) - i.e. tensor for each image size, but that would require additional loop downstream.

In the end, I decided that (3) is the best approach - because it can be easily transferred to other datasets / domains / tasks.

Long story short - here is my solution - I just extended their sampling function:

github.com/pytorch/pytorch/issues/1512#issuecomment-405015099

Maybe it is worth a PR on Github?

What do you think?

#deep_learning

#data_science

Like this post or have something to say => tell us more in the comments or donate!

[feature request] Support tensors of different sizes as batch elements in DataLoader #1512

Motivating example is returning bounding box annotation for images along with an image. An annotation list can contain variable number of boxes depending on an image, and padding them to a single length (and storing that length) may be n...


snakers4 (Alexander), July 13, 09:15

Tensorboard + PyTorch

6 months ago looked at this - and it was messy

now it looks really polished

github.com/lanpa/tensorboard-pytorch

#data_science

lanpa/tensorboard-pytorch

tensorboard-pytorch - tensorboard for pytorch (and chainer, mxnet, numpy, ...)


snakers4 (spark_comment_bot), July 13, 05:22

2018 DS/ML digest 17

Highlights of the week

(0) Troubling trends with ML scholars

approximatelycorrect.com/2018/07/10/troubling-trends-in-machine-learning-scholarship/

(1) NLP close to its ImageNet stage?

thegradient.pub/nlp-imagenet/

Papers / posts / articles

(0) Working with multi-modal data distill.pub/2018/feature-wise-transformations/

- concatenation-based conditioning

- conditional biasing or scaling ("residual" connections)

- sigmoidal gating

- all in all this approach seems like a mixture of attention / gating for multi-modal problems

(1) Glow, a reversible generative model which uses invertible 1x1 convolutions

blog.openai.com/glow/

(2) Facebooks moonshots - I kind of do not understand much here

- research.fb.com/facebook-research-at-icml-2018/

(3) RL concept flaws?

- thegradient.pub/why-rl-is-flawed/

(4) Intriguing failures of convolutions

eng.uber.com/coordconv/ - this is fucking amazing

(5) People are only STARTING to apply ML to reasoning

deepmind.com/blog/measuring-abstract-reasoning/

Yet another online book on Deep Learning

(1) Kind of standard livebook.manning.com/#!/book/grokking-deep-learning/chapter-1/v-10/1

Libraries / code

(0) Data version control continues to develop dvc.org/features

#deep_learning

#data_science

#digest

Like this post or have something to say => tell us more in the comments or donate!

Troubling Trends in Machine Learning Scholarship

By Zachary C. Lipton* & Jacob Steinhardt* *equal authorship Originally presented at ICML 2018: Machine


snakers4 (Alexander), July 09, 09:04

2018 DS/ML digest 16

Papers / posts

(0) RL now solves Quake

venturebeat.com/2018/07/03/googles-deepmind-taught-ai-teamwork-by-playing-quake-iii-arena/

(1) A fast.ai post about AdamW

www.fast.ai/2018/07/02/adam-weight-decay/

-- Adam generally requires more regularization than SGD, so be sure to adjust your regularization hyper-parameters when switching from SGD to Adam

-- Amsgrad turns out to be very disappointing

-- Refresher article ruder.io/optimizing-gradient-descent/index.html#nadam

(2) How to tackle new classes in CV

petewarden.com/2018/07/06/what-image-classifiers-can-do-about-unknown-objects/

(3) A new word in GANs?

-- ajolicoeur.wordpress.com/RelativisticGAN/

-- arxiv.org/pdf/1807.00734.pdf

(4) Using deep learning representations for search

-- goo.gl/R1vhTh

-- library for fast search on python github.com/spotify/annoy

(5) One more paper on GAN convergence

avg.is.tuebingen.mpg.de/publications/meschedericml2018

(6) Switchable normalization - adds a bit to ResNet50 + pre-trained models

github.com/switchablenorms/Switchable-Normalization

Datasets

(0) Disney starts to release datasets

www.disneyanimation.com/technology/datasets

Market / interesting links

(0) A motion to open-source GitHub

github.com/dear-github/dear-github/issues/304

(1) Allegedly GTX 1180 start in sales appearing in Asia (?)

(2) Some controversy regarding Andrew Ng and self-driving cars goo.gl/WNW4E3

(3) National AI strategies overviewed - goo.gl/BXDCD7

-- Canada C$135m

-- China has the largest strategy

-- Notably - countries like Finland also have one

(4) Amazon allegedly sells face recognition to the USA goo.gl/eDzekn

#data_science

#deep_learning

Google’s DeepMind taught AI teamwork by playing Quake III Arena

Google’s DeepMind today shared the results of training multiple AI systems to play Capture the Flag on Quake III Arena, a multiplayer first-person shooter game. The AI played nearly 450,000 g…


snakers4 (Alexander), July 08, 06:06

A new multi-threaded addition to pandas stack?

Read about this some time ago (when this was just in development //snakers41.spark-in.me/1850) - found essentially 3 alternatives

- just being clever about optimizing your operations + using what is essentially a multi-threaded map/reduce in pandas //snakers41.spark-in.me/1981

- pandas on ray

- dask (overkill)

Links:

(0) rise.cs.berkeley.edu/blog/pandas-on-ray-early-lessons/

(1) www.reddit.com/comments/8wuz7e

(2) github.com/modin-project/modin

So...I ran a test in the notebook I had on hand. It works. More tests will be done in future.

pics.spark-in.me/upload/2c7a2f8c8ce1dd7a86a54ec3a3dcf965.png

#data_science

#pandas

Spark in me - Internet, data science, math, deep learning, philosophy

Pandas on Ray - RISE Lab https://rise.cs.berkeley.edu/blog/pandas-on-ray/


snakers4 (spark_comment_bot), July 07, 12:29

Playing with VAEs and their practical use

So, I played a bit with Variational Auto Encoders (VAE) and wrote a small blog post on this topic

spark-in.me/post/playing-with-vae-umap-pca

Please like, share and repost!

#deep_learning

#data_science

Like this post or have something to say => tell us more in the comments or donate!

Playing with Variational Auto Encoders - PCA vs. UMAP vs. VAE on FMNIST / MNIST

In this article I thoroughly compare the performance of VAE / PCA / UMAP embeddings on a simplistic domain - UMAP Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), July 04, 07:57

2018 DS/ML digest 15

What I filtered through this time

Market / news

(0) Letters by big company employees against using ML for weapons

- Microsoft

- Amazon

(1) Facebook open sources Dense Pose (eseentially this is Mask-RCNN)

- research.fb.com/facebook-open-sources-densepose/

Papers / posts / NLP

(0) One more blog post about text / sentence embeddings goo.gl/Zm8C2c

- key idea different weighting

(1) One more sentence embedding calculation method

- openreview.net/pdf?id=SyK00v5xx ?

(2) Posts explaing NLP embeddings

- www.offconvex.org/2015/12/12/word-embeddings-1/ - some basics - SVD / Word2Vec / GloVe

-- SVD improves embedding quality (as compared to ohe)?

-- use log-weighting, use TF-IDF weighting (the above weighting)

- www.offconvex.org/2016/02/14/word-embeddings-2/ - word embedding properties

-- dimensions vs. embedding quality www.cs.princeton.edu/~arora/pubs/LSAgraph.jpg

(3) Spacy + Cython = 100x speed boost - goo.gl/9TwVqu - good to know about this as a last resort

- described use-case

you are pre-processing a large training set for a DeepLearning framework like pyTorch/TensorFlow

or you have a heavy processing logic in your DeepLearning batch loader that slows down your training

(4) Once again stumbled upon this - blog.openai.com/language-unsupervised/

(5) Papers

- Simple NLP embedding baseline goo.gl/nGujzS

- NLP decathlon for question answering goo.gl/6HHi7q

- Debiasing embeddings arxiv.org/abs/1806.06301

- Once again transfer learning in NLP by open-AI - goo.gl/82VR4U

#deep_learning

#digest

#data_science

Download full.pdf 0.04 MB

snakers4 (Alexander), July 02, 04:51

2018 DS/ML digest 14

Amazing article - why you do not need ML

- cyberomin.github.io/startup/2018/07/01/sql-ml-ai.html

- I personally love plain-vanilla SQL and in 90% of cases people under-use it

- I even wrote 90% of my JSON API on our blog in pure PostgreSQL xD

Practice / papers

(0) Interesting papers from CVPR towardsdatascience.com/the-10-coolest-papers-from-cvpr-2018-11cb48585a49

(1) Some down-to-earth obstacles to ML deploy habr.com/company/hh/blog/415437/

(2) Using synthetic data for CNNs (by Nvidia) - arxiv.org/pdf/1804.06516.pdf

(3) This puzzles me - so much effort and engineering spent on something ... strange and useless - taskonomy.stanford.edu/index.html

On paper they do a cool thing - investigate transfer learning between different domains, but in practice it is done on TF and there is no clear conclusion of any kind

(4) VAE + real datasets siavashk.github.io/2016/02/22/autoencoder-imagenet/ - only small Imagenet (64x64)

(5) Understanding the speed of models deployed on mobile - machinethink.net/blog/how-fast-is-my-model/

(6) A brief overview of multi-modal methods medium.com/mlreview/multi-modal-methods-image-captioning-from-translation-to-attention-895b6444256e

Visualizations / explanations

(0) Amazing website with ML explanations explained.ai/

(1) PCA and linear VAEs are close pvirie.wordpress.com/2016/03/29/linear-autoencoders-do-pca/

#deep_learning

#digest

#data_science

No, you don't need ML/AI. You need SQL

A while ago, I did a Twitter thread about the need to use traditional and existing tools to solve everyday business problems other than jumping on new buzzwords, sexy and often times complicated technologies.


snakers4 (Alexander), July 01, 11:48

Measuring feature importance properly

explained.ai/rf-importance/index.html

Once again stumbled upon an amazing article about measuring feature importance for any ML algorithms:

(0) Permutation importance - if your ML algorithm is costly, then you can just shuffle a column and check importance

(1) Drop column importance - drop a column, re-train a model, check performance metrics

Why it is useful / caveats

(0) If you really care about understanding your domain - feature importances are a must have

(1) All of this works only for powerful models

(2) Landmines include - correlated or duplicate variables, data normalization

Correlated variables

(0) For RF - correlated variables share permutation importance roughly proportionally to their correlation

(1) Drop column importance can behave unpredictably

I personally like engineering different kinds of features and doing ablation tests:

(0) Among feature sets, sharing similar purpose

(1) Within feature sets

#data_science

snakers4 (Alexander), June 10, 15:35

And now the habr.ru article is also live -

habr.com/post/413775/

Please support us with your likes!

#deep_learning

#data_science

Состязательные атаки (adversarial attacks) в соревновании Machines Can See 2018

Или как я оказался в команде победителей соревнования Machines Can See 2018 adversarial competition. Суть любых состязательных атак на примере. Так уж...


snakers4 (Alexander), June 10, 06:50

An interesting idea from a CV conference

Imagine that you have some kind of algorithm, that is not exactly differentiable, but is "back-propable".

In this case you can have very convoluted logic in your "forward" statement (essentially something in between trees and dynamic programming) - for example a set of clever if-statements.

In this case you will be able to share both of the 2 worlds - both your algorithm (you will have to re-implement in your framework) and backprop + CNN. Nice.

Ofc this works only for dynamic deep-learning frameworks.

#deep_learning

#data_science

Machines Can See 2018 adversarial competition

Happened to join forces with a team that won 2nd place in this competition

- spark-in.me/post/playing-with-mcs2018-adversarial-attacks

It was very entertaining and a new domain to me.

Read more materials:

- Our repo github.com/snakers4/msc-2018-final

- Our presentation drive.google.com/file/d/1P-4AdCqw81nOK79vU_m7IsCVzogdeSNq/view

- All presentations drive.google.com/file/d/1aIUSVFBHYabBRdolBRR-1RKhTMg-v-3f/view

#data_science

#deep_learning

#adversarial

Playing with adversarial attacks on Machines Can See 2018 competition

This article is about MCS 2018 competition and my participation in it, adversarial attack methods and how out team won Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), June 07, 14:13

snakers4 (Alexander), May 25, 07:29

New competitions on Kaggle

Kaggle has started a new competition with video ... which is one of those competitions (read between the lines - blatant marketing)

www.kaggle.com/c/youtube8m-2018

I.e.

- TensorFlow Record files

- Each of the top 5 ranked teams will receive $5,000 per team as a travel award - no real prizes

- The complete frame-level features take about 1.53TB of space (and yes, these are not videos, but extracted CNN features)

So, they are indeed using their platform to promote their business interests.

Released free datasets are really cool, but only when you can use then for transfer learning, which implies also seeing the underlying ground level data (i.e. images of videos).

#data_science

#deep_learning

The 2nd YouTube-8M Video Understanding Challenge

Can you create a constrained-size model to predict video labels?