Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1298 members, 1458 posts since 2016

All this - lost like tears in rain.

Internet, data science, math, deep learning, philosophy. No bs.

Our website
- spark-in.me
Our chat
- goo.gl/WRm93d
DS courses review
- goo.gl/5VGU5A
- goo.gl/YzVUKf

Posts by tag «digest»:

snakers4 (spark_comment_bot), May 13, 11:25

2018 DS/ML digest 10

Market

(0) Some moonshots by Google in working with electronic health records

(1) Google duplex - a narrow domain bot that makes calls for you

(2) Nature wants to make its ML journal ... paid

(3) Standford DawnBench - training Imagenet encoders as quickly and cheaply as possible

(4) Facebook achieves 85% on Imagenet by training on 1bn images in 336 GPUs in a week

(5) Learning the models of the surrounding world based on a DOOM like game

Practice / libraries / code

(0) A smarter and new way to ensemble CNNs

- Traditional approach - ensemble CNNS with different architecture - and just vote / average / apply linear regression on top

- Newer approach - use Cyclic Learning rate

- Even newer approach - model snapshot ensembling

- Stochastic Weight Averaging

-- store running average of the models

-- train one model with CLR

-- at the end of each lr update (or epoch) - do a running average of the models with some weights

-- the gist of the method is located on this line

-- I do understand why the update bnorm params, but I do not understand why it cannot be done just running 1 train epoch

- Papers on CNN ensembling 1 2 3

(1) (RU) Small amount of technocal details, but face-detection + face hashing works in retail (+human operator) given an HD camera

(2) (RU) Pose estimation

(3) Numpy autograd

"New" papers worth mentioning

(0) SqueezeNext

- Module comparsion

- Key changes

(i) more aggressive channel reduction by incorporating a two-stage squeeze module

(ii separable 3 × 3 convolutions

(iii) element-wise addition skip co

nection similar to ResNet

- Performance

(1) GANs to generate full-body anime characters in different poses

Visualizations:

(0) (does not work in Firefox) Visualizing encoder-decoder networks for translation

#data-science

#deep-learning

#digest

Like this post or have something to say => tell us more in the comments or donate!

Deep Learning for Electronic Health Records

Posted by Alvin Rajkomar MD, Research Scientist and Eyal Oren PhD, Product Manager, Google AI When patients get admitted to a hospital, th...


snakers4 (Alexander), May 10, 05:39

Internet

Interesting links about Internet

(0) Ben Evans goo.gl/gvNBhS

Russia / CIS

(0) Telegram has a new proxy setting in alpha, though no proper stand-alone solutions are published

t.me/dvachannel/21784

(1) Western media now cover Telegram

goo.gl/nPJ4Sm

Global / tech

(0) Xiaomi to file for an IPO - US$10 - US$100bn

(1) Yet another drag and drop ML that will (m?) fail - lobe.ai/ - this is so American

(2) Now all "major" apps heavily feature "stories" as main mobile format - goo.gl/wbnHYD

Yet another reason to quit all social media and just use professional apps / messaging

Add up all this bs => this is the reason normal people do not use social media for real now

(3) Tesla most shorted tech company now - goo.gl/11yndY xD

Figures

(0) YouTube - 1.8bn users with 1+ login goo.gl/kyXFDH

(1) WhatsApp m70bn messages per day (vs. 20bn max with SMS) goo.gl/67DdVn

#internet

#digest

snakers4 (Alexander), May 01, 16:52

2018 DS/ML digest 9

Market / libraries

(0) Tensorflow + Swift - wtf - goo.gl/FDvLM4

(1) Geektimes / Habrhabr.ru going international - goo.gl/dbGNwD

(2) A service for renting GPUs ... from people

- Reddit goo.gl/HxQ54x

- Link vectordash.com/hosting/

- Looks LXC based (afaik - the only user friendly alternative to Docker)

- Cool in theory, no idea how secure this is - we can assume as secure as providing a docker container to stranger

- They did not reply me in a week

(3) A friend sent me a new list of ... new yet another PyTorch NLP libraries

- goo.gl/kasRfZ, goo.gl/XXnbJy (AllenNLP is the biggest library like this)

- I believe that such libraries are more or less useless for real tasks, but cool to know they exist

(4) New SpaceNet 4? goo.gl/CsSS6P

(5) A new super cool competition on Kaggle about particle physics? www.kaggle.com/c/trackml-particle-identification

Tutorials / basics

(0) Bias vs. Variance (RU) goo.gl/4Y7tH7

(1) Yet another magic Jupyter guideline collection - goo.gl/AFWMuq

Real world ML applications

(0) Resnet + object detection (RU) - people wo helmets 90% accuracy - goo.gl/7xpQnE

(1) Fast.ai about using embeddings with Tabular data - www.fast.ai/2018/04/29/categorical-embeddings/

Very similar to our approach on electricity

I personally do not recommend using their library by all means

(2) Comparing Google TPU vs. V100 with ResNet50 - goo.gl/s6dhsy

- speed - goo.gl/Pww2sm

- pricing - goo.gl/Rtkp8Q

- but ... buying GPUs is much cheaper

(3) Other blog posts about embeddings + tabular data

- Sales prediction blog.kaggle.com/2016/01/22/rossmann-store-sales-winners-interview-3rd-place-cheng-gui/

- Taxi drive prediction blog.kaggle.com/2015/07/27/taxi-trajectory-winners-interview-1st-place-team-%F0%9F%9A%95/

MLP + classification + embeddings - goo.gl/AMNGNG / arxiv.org/pdf/1508.00021.pdf

(4) Albu's solution to SpaceNet - augmentations github.com/SpaceNetChallenge/RoadDetector/tree/master/albu-solution/src/augmentations

CNN overview

Neural network part:

Split data to 4 folds randomly but the same number of each city tiles in every fold

Use resnet34 as encoder and unet-like decoder (conv-relu-upsample-conv-relu) with skip connection from every layer of network. Loss function: 0.8*binary_cross_entropy + 0.2*(1 – dice_coeff). Optimizer – Adam with default params.

Train on image crops 512*512 with batch size 11 for 30 epoch (8 times more images in one epoch)

Train 20 epochs with lr 1e-4

Train 5 epochs with lr 2e-5

Train 5 epochs with lr 4e-6

Predict on full image with padding 22 on borders (1344*1344).

Merge folds by mean

Jobs / job market

(0) Developers by country by scraping GitHub - goo.gl/n8gnLi

- developers count vs. GDP prntscr.com/j9v80e R^2 = 84%

- developers count vs. population - R^2 = 50%

Visualization

(0) Interactive tool for visualizing convolutions - ezyang.github.io/convolution-visualizer/

Datasets

(0) Open Images v4 outsourced

- research.googleblog.com/2018/04/announcing-open-images-v4-and-eccv-2018.html

- the dataset itself storage.googleapis.com/openimages/web/download.html

- categories storage.googleapis.com/openimages/2018_04/bbox_labels_600_hierarchy_visualizer/circle.html

#data_science

#deep_learning

#digest

tensorflow/swift

swift - Swift for TensorFlow documentation repository.


snakers4 (Alexander), April 15, 08:06

2018 DS/ML digest 8

As usual my short bi-weekly (or less) digest of everything that passed my BS detector

Market / blog posts

(0) Fast.ai about the importance of accessibility in ML - www.fast.ai/2018/04/10/stanford-salon/

(1) Some interesting news about market, mostly self-driving cars (the rest is crap) - goo.gl/VKLf48

(2) US$600m investment into Chinese face recognition - goo.gl/U4k2Mg

Libraries / frameworks / tools

(0) New 5 point face detector in Dlib for face alignment task - goo.gl/T73nHV

(1) Finally a more proper comparsion of XGB / LightGBM / CatBoost - goo.gl/AcszWZ (also see my thoughts here snakers41.spark-in.me/1840)

(3) CNNs on FPGAs by ZFTurbo

-- www.youtube.com/watch?v=Lhnf596o0cc

-- github.com/ZFTurbo/Verilog-Generator-of-Neural-Net-Digit-Detector-for-FPGA

(4) Data version control - looks cool

-- dataversioncontrol.com

-- goo.gl/kx6Qdf

-- but I will not use it - becasuse proper logging and treating data as immutable solves the issue

-- looks like over-engineering for the sake of overengineering (unless you create 100500 datasets per day)

Visualizations

(0) TF Playground to seed how simplest CNNs work - goo.gl/cu7zTm

Applications

(0) Looks like GAN + ResNet + Unet + content loss - can easily solve simpler tasks like deblurring goo.gl/aviuNm

(1) You can apply dilated convolutions to NLP tasks - habrahabr.ru/company/ods/blog/353060/

(2) High level overview of face detection in ok.ru - goo.gl/fDUXa2

(3) Alternatives to DWT and Mask-RCNN / RetinaNet? medium.com/@barvinograd1/instance-embedding-instance-segmentation-without-proposals-31946a7c53e1

- Has anybody tried anything here?

Papers

(0) A more disciplined approach to training CNNs - arxiv.org/abs/1803.09820 (LR regime, hyper param fitting etc)

(1) GANS for iamge compression - arxiv.org/pdf/1804.02958.pdf

(2) Paper reviews from ODS - mostly moonshots, but some are interesting

-- habrahabr.ru/company/ods/blog/352508/

-- habrahabr.ru/company/ods/blog/352518/

(3) SqueezeNext - the new SqueezeNet - arxiv.org/abs/1803.10615

#digest

#data_science

#deep_learning

snakers4 (Alexander), April 07, 11:52

Internet digest

- Ben Evans - mailchi.mp/ben-evans/benedicts-newsletter-no-450525?e=b7fff6bc1c

- About autonomous cars - www.ben-evans.com/benedictevans/2018/3/26/steps-to-autonomy - autonomy will vary based on the route / conditions / situation / use case

- FB delays its speaker - www.bloomberg.com/technology

- Foxconn buys Belking goo.gl/Xf6g9A

- Amazon music > 10m subs - goo.gl/C8Qhdm

- The Economist about ML in business - goo.gl/fTCHE9

- Apple to make its own chips - goo.gl/ZkkEVc

#internet

#digest

snakers4 (Alexander), March 30, 10:35

Internet digest

- Chrome OS on tablets - goo.gl/K5iCJw

- Facial recognition in China - goo.gl/aJjPH5 - 1984

- Ikea + AR manual - goo.gl/WW6Eqg

- WildBerries.ru stats - goo.gl/qPspe1

- Digital content forgery and ML - goo.gl/e5tqWa

- On Facebook tracking your SMS and calls

-- newsroom.fb.com/news/2018/03/fact-check-your-call-and-sms-history/

#digest

#internet

Google debuts Chrome OS tablets to take on the iPad in education

Ahead of Apple’s education-focused event tomorrow where a new affordable iPad is expected, Google this morning announced the first Chrome OS tablet. The Acer Chromebook Tab 10 is a new form f…


snakers4 (Alexander), March 20, 05:03

Internet / tech

(1) LIDAR - bridge technology www.ben-evans.com/benedictevans/2018/3/12/bridges

(2) VW to invest US$25bn in batteries goo.gl/yPrpUX

(3) Self-driving car kills a pedestrian goo.gl/Md3Cbs

(4) Terminal case of marketing bs - theranos - goo.gl/zNjZPL

(5) Spotify was a P2P app at first lol - goo.gl/e8riLc

(6) Stack Overflow survey 2018 - stackoverflow.blog/2018/01/08/take-2018-developer-survey/

Lol

(1) Prototype of small flying car - cora.aero

#digest

Bridges and LIDAR

A bridge product says 'of course x is the right way to do this, but the technology or market environment to deliver x is not available yet, or is too expensive, and so here is something that gives some of the same benefits but works now.'  Sometimes that’s a great business, and sometimes it


snakers4 (Alexander), March 13, 10:08

Internet digest

(1) Ben Evans - goo.gl/8f4RkE

Market

(1) Waymo launching pilot for the self-driving trucks - goo.gl/Bw2R9Q

(2) Netflix to spend US$8bn on ~700 shows in 2018 - goo.gl/6myKj6 (sic!)

(3) Intel vs Qualcomm and Broadcomm - goo.gl/pa3iYB + Inter considering to buy Broadcomm - goo.gl/XP8fqd

(4) Amazon buys ring - goo.gl/cnMw6o

(5) Latest darkmarket bust - Hansa - goo.gl/YcUxYD - it was not busted at once, but put under surveillance

- As with Silk Road - all started with the officials finding a server and making a copy of hard drive

- This time - it was a dev server

- It contained ... owners' IRC accounts and some personal info

Internet + ML

(1) Netflix uses ML to generate thumbnails for its shows automatically - goo.gl/6poibk

- Features collected: manual annotation, meta-data, object detection, brightness, colour, face detection, blur, motion detection, actors, mature content

#internet

#digest

Also also

(1) Dropbox - www.sec.gov/Archives/edgar/data/1467623/000119312518055809/d451946ds1.htm

(2) And Spotify www.sec.gov/Archives/edgar/data/1639920/000119312518063434/d494294df1.htm

filed for IPOs

#internet

snakers4 (Alexander), March 10, 13:59

Interesting / noteworthy semseg papers

In practice - UNet and LinkNet are best and simple solutions.

Rarely people report that something like Tiramisu works properly.

Though I saw once in last Konika competition - a good solution based on DenseNet + Standard decoder.

So I decided to read some of the newer and older Semseg papers.

Classic papers

UNet,LinkNet - nuff said

(0) Links

- UNet - arxiv.org/abs/1505.04597

- LinkNet - arxiv.org/abs/1707.03718

Older, overlooked, but interesting papers

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

One of original papers before UNet

(0) arxiv.org/abs/1511.00561

(1) Basically UNet w/o skip connections but it stores pooling indices

(1) SegNet uses the max pooling indices to upsample (without learning) the feature map(s) and convolves with a trainable decoder filter bank

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Paszke, Adam / Chaurasia, Abhishek / Kim, Sangpil / Culurciello, Eugenio

(0) Link

- arxiv.org/abs/1606.02147

(1) Key facts

- Up to 18× faster, 75× less FLOPs, 79× less parameters vs SegNet or FCN

- Supposedly runs on NVIDIA Jetson TX1 Embedded Systems

- Essentially a minzture of ResNet and Inception architectures

- Overview of the architecture

-- goo.gl/M6CPEv

-- goo.gl/b5Kb2S

(2) Interesting ideas

- Visual information is highly spatially redundant, and thus can be compressed into a more efficient representation

- Highly assymetric - decoder is much smaller

- Dilated convolutions in the middle => significant accuracy boost

- Dropout > L2

- Pooling operation in parallel with a convolution of stride 2, and concatenate resulting feature maps

Newer papers

xView: Objects in Context in Overhead Imagery - new "Imagenet" for satellite images

(0) Link

- Will be available here xviewdataset.org/#register

(1) Examples

- goo.gl/JKr9wW

- goo.gl/TWRmn2

(2) Stats

- 0.3m ground sample distance

- 60 classes in 7 different parent classes

- 1 million labeled objects covering over 1,400 km2 of the earth’s surface

- classes goo.gl/v9CM5b

(3) Baseline

- Their baseline using SSD has very poor performance ~20% mAP

Rethinking Atrous Convolution for Semantic Image Segmentation

(0) Link

- arxiv.org/abs/1706.05587

- Liang-Chieh Chen George / Papandreou Florian / Schroff Hartwig / Adam

- Google Inc.

(1) Problems to be solved

- Reduced feature resolution

- Objects at multiple scales

(2) Key approaches

- Image pyramid (reportedly works poorly and requires a lot of memory)

- Encoder-decoder

- Spatial pyramid pooling (reportedly works poorly and requires a lot of memory)

(3) Key ideas

- Atrous (dilated) convolution - goo.gl/uSFCv5

- ResNet + Atrous convolutions - goo.gl/pUjUBS

- Atrous Spatial Pyramid Pooling block goo.gl/AiQZC1 - goo.gl/p63qNR

(4) Performance

- As with the latest semseg methods, true performance boost is unclear

- I would argue that such methods may be useful for large objects

#digest

#deep_learning

snakers4 (Alexander), March 07, 05:03

2018 DS/ML digest 6

Visualization

(1) A new amazing post by Google on distil - distill.pub/2018/building-blocks/.

This is really amazing work, but their notebooks tells me that it is a far cry from being able to be utilized by the community - goo.gl/3c1Fza

This is how the CNN sees the image - goo.gl/S4KT5d

Expect this to be packaged as part of Tensorboard in a year or so)

Datasets

(1) New landmark dataset by Google - goo.gl/veSEhg - looks cool, but ...

Prizes in the accompanying Kaggle competitions are laughable goo.gl/EEGDEH goo.gl/JF93Xx

Given that datasets are really huge...~300G

Also also if you win, you will have to buy a ticket to the USA on your money ...

(2) Useful script to download the images goo.gl/JF93Xx

(3) Imagenet for satellite imagery - xviewdataset.org/#register - pre-register

arxiv.org/pdf/1802.07856.pdf paper

(4) CVPR 2018 for satellite imagery - deepglobe.org/challenge.html

Papers / new techniques

(1) Improving RNN performance via auxiliary loss - arxiv.org/pdf/1803.00144.pdf

(2) Satellite imaging for emergencies - arxiv.org/pdf/1803.00397.pdf

(3) Baidu - neural voice cloning - goo.gl/uJe852

Market

(1) Google TPU benchmarks - goo.gl/YKL9yx

As usual such charts do not show consumer hardware.

My guess is that a single 1080Ti may deliver comparable performance (i.e. 30-40% of it) for ~US$700-1000k, i.e. ~150 hours of rent (this is ~ 1 week!)

Miners say that 1080Ti can work 1-2 years non-stop

(2) MIT and SenseTime announce effort to advance artificial intelligence research goo.gl/MXB3V9

(3) Google released its ML course - goo.gl/jnVyNF - but generally it is a big TF ad ... Andrew Ng is better for grasping concepts

Internet

(1) Interesting thing - all ISPs have some preferential agreements between each other - goo.gl/sEvZMN

#digest

#data_science

#deep_learning

The Building Blocks of Interpretability

Interpretability techniques are normally studied in isolation. We explore the powerful interfaces that arise when you combine them -- and the rich structure of this combinatorial space.


snakers4 (Alexander), February 28, 10:40

Forwarded from Data Science:

Most common libraries for Natural Language Processing:

CoreNLP from Stanford group:

stanfordnlp.github.io/CoreNLP/index.html

NLTK, the most widely-mentioned NLP library for Python:

www.nltk.org/

TextBlob, a user-friendly and intuitive NLTK interface:

textblob.readthedocs.io/en/dev/index.html

Gensim, a library for document similarity analysis:

radimrehurek.com/gensim/

SpaCy, an industrial-strength NLP library built for performance:

spacy.io/docs/

Source: itsvit.com/blog/5-heroic-tools-natural-language-processing/

#nlp #digest #libs

Stanford CoreNLP

High-performance human language analysis tools. Widely used, aavailable open source; written in Java.


snakers4 (Alexander), February 24, 05:56

2017 DS/ML digest 5

Fun stuff

(1) Hardcore metal + CNNs + style transfer - goo.gl/VHYfHe

SpaceNet challenge

(1) Post by Nvidia goo.gl/6Mw4CB

(2) Some links to sota semseg articles

(3) Useful tools for CV - floodfill and grabcut, but guys from Nvidia did not notice ... that road width was in geojson data...

(4) Looks like they replicated the results just for PR, but their masks do not look appealing

Research / papers / libraries

(1) Neural Voice Cloning with a Few Samples - goo.gl/LwmzRf (demos audiodemos.github.io.)

(2) A library for CRFs in Python - goo.gl/cQc8hA

(3) 1000x faster CNN architecture search - still on CIFAR - arxiv.org/pdf/1802.03268.pdf (PyTorch goo.gl/BZ9Vrh)

(4) URLs + CNN - malicious link detection - arxiv.org/abs/1802.03162

Datasets

(1) 3m anime image dataset - www.gwern.net/Danbooru2017

(2) Google HDR dataset - goo.gl/XEL1Fm

Market

(1) Idea - AMT + blockchain - goo.gl/JfzEPV

(2) ARM to make processors for CNNs? - goo.gl/MpdPSB

(3) Google TPU in beta - goo.gl/gRzq9t - very expensive. + Note the rumours that Google's own people do not use their TPU quota

(4) One guy managed to deploy a PyTorch model using ONNX - goo.gl/QD4DkZ

#digest

#machine_learning

#data_science

Hardcore Anal Hydrogen "Jean-Pierre" (2018, Apathia Records)

Order "Hypercut" : http://apathia.link/hah Bandcamp : https://hardcoreanalhydrogen.bandcamp.com/album/hypercut « A gigantic piece of art here to mess with wh...


snakers4 (Alexander), February 20, 04:40

Internet Digest

- Ben Evans - goo.gl/XsBqHN

- Flipboard (orly) launches ads - goo.gl/2muoiT

- Google sold 3.9 million Pixel phones in 2017 - goo.gl/6eUiXw

- Looks like smartbuses may be cool. App => bus route information => route gap => launch cosy bus with music and social features - goo.gl/TjKndB (I doubt this is a business though)

- About the importance of decentralization - next Internet will be a set of cryptonetwork protocols - goo.gl/c2aB4n

- How London is responding to technological innovationgoo.gl/Dh6NgD

(1) Connected and autonomous vehicles (CAVs) or driverless (2) cars won't be on the road until the 2030s at least and could add to congestion

(3) Dockless cycle schemes need to be able to operate across London to be effective

(4) There is no control system in place for drones and droids

(5) TfL is monitoring technological developments but this needs to be embedded across the whole organisation

- Nice info graphics about city dwellers daily routes on pages 7-10 - goo.gl/vV71DR

#internet

#digest

snakers4 (Alexander), February 14, 11:48

2017 DS/ML digest 4

Applied cool stuff

- How Dropbox build their OCR - via CTC loss - goo.gl/Dumcn9

Fun stuff

- CNN forward pass done in Google Sheets - goo.gl/pyr44P

- New Boston Robotics robot - opens doors now - goo.gl/y6G5bo

- Cool but toothless list of jupyter notebooks with illustrations and models modeldepot.io

- Best CNN filter visualization tool ever - ezyang.github.io/convolution-visualizer/index.html

New directions / moonshots / papers

- IMPALA from Google - DMLab-30, a set of new tasks that span a large variety of challenges in a visually unified environment with a common action space

-- goo.gl/7ASXdk

-- twitter.com/DeepMindAI/status/961283614993539072

- Trade crypto via RL - goo.gl/NmCQSY?

- SparseNets? - arxiv.org/pdf/1801.05895.pdf

- Use Apple watch data to predict diseases arxiv.org/abs/1802.02511?

- Google - Evolution in auto ML kicks in faster than RL - arxiv.org/pdf/1802.01548.pdf

- R-CNN for human pose estimation + dataset

-- Website + video densepose.org

-- Paper arxiv.org/abs/1802.00434

Google's Colaboratory gives free GPUs?

- Old GPUs

- 12 hours limit, but very cool in theory

- habrahabr.ru/post/348058/

- www.kaggle.com/getting-started/47096#post271139

Sick sad world

- China has police Google Glass with face recognition goo.gl/qfNGk7

- Why slack sucks - habrahabr.ru/post/348898/

-- Email + google docs is better for real communication

Market

- Globally there are 22k ML developers goo.gl/1Jpt9P

- One more AI chip moonshot - goo.gl/199f5t

- Google made their TPUs public in beta - US$6 per hour

- CNN performance comparable to human level in dermatology (R-CNN) - goo.gl/gtgXVn

- Deep learning is greedy, brittle, opaque, and shallow goo.gl/7amqxB

- One more medical ML investment - US$25m for cancer - goo.gl/anndPP

#digest

#data_science

#deep_learning

snakers4 (Alexander), February 13, 08:19

Internet digest

- Ben Evans - goo.gl/7e1M4H

- FB tried to buy Snapchat 2 times - for US$60m and US$3b - goo.gl/xUVAM1

- Allegedly some ML can achieve 85% diabetes prediction accuracy on apple watch sensor data - goo.gl/Jyz5fG

- Cars may embrace 48 volts instead of 12 volts - goo.gl/Xmq9W5

- Google reabsorbs Nest (read between the lines - it was successful) - goo.gl/TzbTtY

- Snap +70% revenue growth - goo.gl/CQM6Xn

- 7 of 8 USA top grocers participate in Instacart - goo.gl/CAmoqA

- Siri APIs are fragmented lol - goo.gl/D6vvMK

- Uber agreed to provide Waymo, the self-driving car unit under Google’s parent company, Alphabet, with 0.34 percent of its stock - goo.gl/uatWBx

#internet

#digest

snakers4 (Alexander), February 07, 09:58

Internet digest

- Ben Evans - goo.gl/VKLgma

- Ben Evans about smart home hype - goo.gl/jPrCEd

- Google closing Google Fiber - goo.gl/urftJc

- Amazon tracks warehouse slackers with wristbands - goo.gl/avtMyn

- Apple music overtaking Spotify - goo.gl/ghQ43p

- Why people like infinite scroll goo.gl/tp1XNV

- Netflix personalizes artwork - goo.gl/dF5hLL

- Self-driving trucks => morel local trucking jobs goo.gl/tfaZSS

#internet

#digest

snakers4 (Alexander), February 01, 11:25

2017 DS/ML digest 2

Libraries

- One more RL library (last year saw 1 or 2) ray.readthedocs.io/en/latest/rllib.html

- Speech recognition from facebook - github.com/facebookresearch/wav2letter

- Even better speech generation than WaveNet - goo.gl/mTwyoV - I cannot tell computer apart

Industry (overdue news)

- Nvidia does not like it's consumer GPUs deployed in data centers goo.gl/n8mkxk

- Clarifai kills forevery goo.gl/PxcjvT

- Google search and gorillas vs. black people - goo.gl/t6LwLN

Blog posts

- Baidu - dataset size vs. accuracy goo.gl/j6M5ZP (log-scale)

-- goo.gl/AYan3f

-- goo.gl/JyVNHG

Datasets

- New Youtube actions dataset - arxiv.org/abs/1801.03150

-- arxiv.org/abs/1801.03150

Papers - current topic - meta learning / CNN optimization and tricks

- Systematic evaluation of CNN advances on the ImageNet arxiv.org/abs/1606.02228

-- prntscr.com/i8il35

- TRAINING DEEP NEURAL NETWORKS ON NOISY LABELS WITH BOOTSTRAPPING arxiv.org/abs/1412.6596

-- prntscr.com/i8iq1p

- Cyclical Learning Rates for Training Neural Networks arxiv.org/abs/1506.01186

-- prntscr.com/i8iqjx

- SEARCHING FOR ACTIVATION FUNCTIONS - arxiv.org/abs/1710.05941

-- prntscr.com/i8l0sd

-- prntscr.com/i8l5dp

- Large batch => train Imagenet in 15 mins

-- arxiv.org/abs/1711.04325

- Practical analysis of CNNs

-- arxiv.org/abs/1605.07678

#digest

#data_science

#deep_learning

snakers4 (Alexander), January 31, 07:14

Internet digest

- Ben Evans - goo.gl/XYKbvr

- RNNs + band names - goo.gl/LBBEiP

- Soldiers + fitness trackers = military bases - goo.gl/B4yzxX

- Google's new unit - security and ML - goo.gl/q1Xnjd

- Apple produces TV content - goo.gl/P2X9Gb

- Some bs rumours about Telegram ICO size - goo.gl/D4XgPD

- Twitter is plagued by bot-farms - goo.gl/ZLHVz1

-- Easy to detect via similar registration dates - goo.gl/ZLHVz1

- Podcast about financial innovations in the US - goo.gl/kxHUQY

#digest

#internet

Jeremy Fiance

recurrent neural network, trained on band names, generates fake @Coachella lineup - reminding us most band names are gibberish


snakers4 (Alexander), January 23, 04:47

Internet Digest

- Ben Evans - goo.gl/TPyLoD

- Youtube tightening moderation screws for small channels - goo.gl/SHpC2h

- Camera strapped to plane - vimeo.com/240106846

- Guardian online getting profitable - goo.gl/CDpNFb

- Amazon testing a shop wo cashiers - you just take goods and walk out - goo.gl/hvh63Z

- Drone saving a drowning person - goo.gl/RdGYDx

ГЫ

- А это отлично зайдет русским ко-ко-ко разрабам и культуре "обсирания всего", которая царит в нашем IT - goo.gl/S5poqv

#internet

#digest

snakers4 (Alexander), January 20, 15:13

2017 DS/ML digest 1

Did not do digests quite for some time =)

1. Annual digests

1.1 Google Brain one - goo.gl/VQhZmP two goo.gl/XkTRhp

Highlights

- Speech generation goo.gl/MEDv7M

- Speech recognition goo.gl/tCEkVz

- Auto ML goo.gl/fx2FuP

-- NASNET - goo.gl/becAET

1.2

Posted before - but WildML 2017 summary is also awesome goo.gl/ZFtFVT

2. Datasets

→ YouTube-8M (goo.gl/nyP9gp): >7 million YouTube → videos annotated with 4,716 different classes

→ YouTube-Bounding Boxes (goo.gl/c3K6YY): 5 million bounding boxes from 210,000 YouTube videos

→ Speech Commands Dataset (goo.gl/TWsTi8): thousands of speakers saying short command words

→ AudioSet (goo.gl/TVA3LJ): 2 million 10-second → → YouTube clips labeled with 527 different sound events

→ Atomic Visual Actions (AVA) (goo.gl/Ba4U73): 210,000 action labels across 57,000 video clips

→ Open Images (goo.gl/2Xj8Xd): 9M creative-commons licensed images labeled with 6000 classes

→ Open Images with Bounding Boxes (goo.gl/qRkvMy): 1.2M bounding boxes for 600 classes

→ QuickDraw dataset (goo.gl/FSsfYm)

3.

Uber about genetic approach to neural networks - eng.uber.com/deep-neuroevolution/

#digest

#data_science

#deep_learning

#machine_learning

The Google Brain Team — Looking Back on 2017 (Part 1 of 2)

Posted by Jeff Dean, Google Senior Fellow, on behalf of the entire Google Brain Team The Google Brain team works to advance the state of ...


snakers4 (Alexander), January 17, 04:18

Internet digest

- Ben Evans - goo.gl/Cymhkf

- New post about chain effects in retail / TV / technology - goo.gl/gwuynK

- 39M smart speakers in the US goo.gl/nkvUc4

- US$1bn ticketing IPO in China - goo.gl/Zt1CmZ

Social Media

- FB updates its news feed algorithm to promote content you are more likely to interact with

newsroom.fb.com/news/2018/01/news-feed-fyi-bringing-people-closer-together/

Trivia

- Magnetic disks work after 30 years - goo.gl/oWoaWi

- Self-driving cars being DEPLOYED for SECOND time for one district with retired people - goo.gl/AKowqX

#internet

#digest

snakers4 (Alexander), January 15, 06:44

Interesting links / news / reports / data

Technology

- TVs and household items being replaced by smartphones => good for ecology and resources - goo.gl/3nw15t

- Once again - Meltdown + Spectre - goo.gl/fNrZGV

Internet

- Ben Evans - goo.gl/usr11B

- Amazon business structure - goo.gl/YKAB9F - hundreds of separate business units

- Uber management planning to sell shares - goo.gl/yJMqgc

- Google sold 6M smart speakers in 2017 - goo.gl/TVnSyY

- Amazon will use Alexa ... for ads - goo.gl/tS3gTU

- Facebook vs fake news goo.gl/mabfp6

- Dark side of the Internet - moderation - goo.gl/gBcyXx

Mobile

- Apple cripples 3rd party AdTech - goo.gl/QdpWwX

- Stats about Facebook chat app - newsroom.fb.com/news/2017/12/messengers-2017-year-in-review/

- In USA instagram is dominated by bra commercials - goo.gl/Ch7ipB

- Dating apps kill gay bars - goo.gl/qyTTk9

- App store 2017 YoY +30% revenue growth - goo.gl/xQFBxz

- 50%+ households in the USA are wireless only - goo.gl/WUXNRY

ML / DS

- If you have not seen WaveNet speech generation examples go here - goo.gl/kbjWXJ

- Apple Maps vs Google Maps - goo.gl/yMNth3

-- Looks like google is using some processing and ML to enhance their maps constantly

-- 3D buildings, small buildings, areas of interest etc

-- Timeline prntscr.com/i0kf4x

- Solid state LIDARs will be much cheaper - goo.gl/YZomWc

- Creepy ML - Google street images => car models => predictions about race / income / job per household / address / zip-code - goo.gl/mTXyW5

- An astronomer shared his experience after spending 3 years getting a Data Science degree - goo.gl/KgTmNp

#digest

The Upside to America's Gadget Infatuation

Smartphones and tablets keep getting smaller, replacing energy-guzzling TVs and PCs while saving on raw materials.


snakers4 (Alexander), January 08, 06:47

A 2017 ML/DS year in review by some venerable / random authors:

- Proper year review by WildML (!!!) - www.wildml.com/2017/12/ai-and-deep-learning-in-2017-a-year-in-review/

-- Includes a lot of links and proper materials

-- AlphaGo

-- Attention

-- RL and genetic algorithm renaissance

-- Pytorch - elephant in the room, TF and others

-- ONNX

-- Medicine

-- GANs

If I had to summarize 2017 in one sentence, it would be the year of frameworks. Facebook made a big splash with PyTorch. Due to its dynamic graph construction similar to what Chainer offers, PyTorch received much love from researchers in Natural Language Processing, who regularly have to deal with dynamic and recurrent structures that hard to declare in a static graph frameworks such as Tensorflow.

Tensorflow had quite a run in 2017. Tensorflow 1.0 with a stable and backwards-compatible API was released in February. Currently, Tensorflow is at version 1.4.1. In addition to the main framework, several Tensorflow companion libraries were released, including Tensorflow Fold for dynamic computation graphs, Tensorflow Transform for data input pipelines, and DeepMind’s higher-level Sonnet library. The Tensorflow team also announced a new eager execution mode which works similar to PyTorch’s dynamic computation graphs.

In addition to Google and Facebook, many other companies jumped on the Machine Learning framework bandwagon:

- Apple announced its CoreML mobile machine learning library.

- A team at Uber released Pyro, a Deep Probabilistic Programming Language.

- Amazon announced Gluon, a higher-level API available in MXNet.

- Uber released details about its internal Michelangelo Machine Learning infrastructure platform.

- And because the number of framework is getting out of hand, Facebook and Microsoft announced the ONNX open format to share deep learning models across frameworks. For example, you may train your model in one framework, but then serve it in production in another one.- In Russian - goo.gl/z1nLzq - kind of meh review (source - goo.gl/NUQ18C)

- Amazing 2017 article about global AI trends - srconstantin.wordpress.com/2017/01/28/performance-trends-in-ai/

- Uber engineering highlights - goo.gl/jBo91k

#digest

#deep_learning

#data_science

AI and Deep Learning in 2017 – A Year in Review

The year is coming to an end. I did not write nearly as much as I had planned to. But I’m hoping to change that next year, with more tutorials around Reinforcement Learning, Evolution, and Ba…


snakers4 (Alexander), January 02, 01:19

A year in retrospective on Spark-in.me:

- spark-in.me/post/spark-in-me-year-one-retrospective

-- Happy holidays!

-- No cringe content

-- Fist year summary and some info for potential customers

#digest

#data_science

A first year retrospective on Spark-in.me

A short annual restrospective - what we achieved, what we learned and what we can do Статьи автора - http://spark-in.me/author/snakers41 Блог - http://spark-in.me


snakers4 (Alexander), December 19, 08:23

Интересное в мире ML:

Научно-популярное

- Видео про философию работы ML алгоритмов - goo.gl/FsCRg7

Data Science:

- MS хочет добавить питон в эксель. Еще и анаконду купит небось.

-- goo.gl/tZ7e82

-- Стили для датафреймов pandas в excel - goo.gl/dhKWdo

-- И уже есть питон для экселя. Бедные сотрудники банков - www.pyxll.com

Deep Learning

- За кем следят участники NIPS в твиттере - goo.gl/y3DXWH

- Unet реально рулит - он еще и текст с картинок выделяет - goo.gl/WAEMYA

- Еще гайд про то, как ломать простую капчу - goo.gl/bkdRhi - в более продвинутых случаях помогут LSTM с attention и CTC (была статья на Distill)

- Самммари с NIPS - goo.gl/Ei7znA

- Пост Fchollet про software 2.0 - goo.gl/dAS2PL

Практическая крипота

- Приклеивание лиц к порно - goo.gl/saoR7D

#data_science

#deep_learning

#digest

How Do Machines Learn?

How do all the algorithms around us learn to do their jobs? SHARE ON THE TWEETBOOK: https://goo.gl/dGUHMV Discuss this video: http://reddit.com/r/cgpgrey Foo...


Если вы учите питон - то вам зайдет скорее всего

- github.com/parrt/lolviz

parrt/lolviz

lolviz - A simple Python data-structure visualization tool for lists of lists, lists, dictionaries; primarily for use in Jupyter notebooks / presentations


snakers4 (Alexander), December 11, 10:01

Интересное в мире интернета:

Безумный мир

- Супер тонкий тролль вывел свой сарай в топ ресторанов trip advisor в Лондоне- goo.gl/7EqDaV

- 52 безумных факта в 2017 году - goo.gl/581Nmz

Интернет, IT

- Ben Evans - goo.gl/r2rwxe

- Apple скорее всего покупает shazam - goo.gl/1ZQ2zB

- Инстаграм тестирует мессенджер - goo.gl/72NGFL

- Guardian выходит на точку безубыточности - goo.gl/3PuqLf

- Гугл запускает с пяток India first продуктов - поиск, оптимизации ОС, телефон, платежную систему - goo.gl/V37HtC

- Как работают бизнес процессы модерации в крупных компаниях - goo.gl/Mfd9A5

Дивайсы

- Аналог интернет камеры за 20-30 баксов - goo.gl/Ztxm7s

- Adoption новых айфонов - внезапно очень быстро растет - goo.gl/QXEaYK

#internet

#digest

I Made My Shed the Top Rated Restaurant On TripAdvisor

And then served customers Iceland ready meals on its opening night.


snakers4 (Alexander), December 11, 09:33

52 безумных факта в 2017 году связанных с интернетом и технологиями.

- goo.gl/581Nmz

Несколько самых забавных

-- A fifth of all the Google searches handled via the mobile app and Android devices are voice searches

-- In 1990, more than a third of people on Earth lived on less than $1.90 a day, adjusted for local prices. By 2013, barely 10 percent of people did.

-- In Silicon Valley, startups that result in a successful exit have an average founding age of 47 years

-- A cryptocurrency mining company called Genesis Mining is growing so fast that they rent Boeing 747s to ship graphics cards to their Bitcoin mines in Iceland.

-- Beggars in China have sophisticated ways to collect payment; using QR Codes, WeChat accounts and in one case a Point Of Sale machine to collect donations.

#digest

52 things I learned in 2017

Between projects at Fluxx, and editing a book, I learned several learnings.


snakers4 (Alexander), December 08, 07:31

Интересное в мире ML / Deep Learning

Как обычно пропускаю через себя много мусора, чтобы найти что-то ценное =)

"Книги"

- Еще одна "книга" - записки ML специалиста - goo.gl/Wmes7p

Python

- Если хотите быстро понять в чем разница между hdf5 и bcolz вам сюда - goo.gl/wfcCri

Reading and writing data to a bcolz.carray is typically a lot faster than HDF5- Есть еще pytables - но я читал, что он более монструозный. Вообще такие штуки нужны если у вас есть массивы на сотни миллионов - миллиарды строк и вы хотите быстро оттуда читать

- Прослойка, чтобы работать с датафреймами на GPU - goo.gl/r8KPGd - если вы в курсе зачем и как - поделитесь опытом в чате ( GPU Dataframe of GPU Open Analytics Initialive (GOAI) )

Deep learning

- Сетки работают т.к. в мире физики доминируют относительно простые функции - goo.gl/JmTA2Y

- Adversarial example для вашего мозга - попробуйте понять что это - goo.gl/PGdX5m

- Фреймворки - новый способ доминирования на рынке ML - goo.gl/ZtMJVF (почему интерфейсы TF такое говно тогда? =) )

- PR кривые в тензорборд - может кому надо - goo.gl/5gM6a1

- Новая статья на Хабре про интуиции нейросетей - goo.gl/fwxcrC

- Безумие - но из попиксельных карт теперь можно создавать улицы в 2К разрешении

-- goo.gl/hoCA4C

-- www.youtube.com/watch?v=3AIpPlzM_qs

-- генератор goo.gl/myMXTQ

-- дискриминатор goo.gl/oki5rq

- Нашумевший StarGan - goo.gl/Gsvuoe

Machine learning / data science

- Внезапно гугл применил свой алгоритм для Го для шахмат - goo.gl/jwTtwb

- Новая статья на Distill - goo.gl/uLXJMr - про artificial intelligence augmentation (AIA): the use of AI systems to help develop new methods for intelligence augmentation - внезапно они тоже нарисовали списрального кота - goo.gl/4KJemD

- Фейсбук и алгоритмы предсказания суицидов - goo.gl/tsZvfH

- Статья от авторов LightGBM - goo.gl/NQFxai

Датасеты

- Мозилла открывает модель и очень много данных по распознаванию голоса

-- данные - voice.mozilla.org/data

-- модель - github.com/mozilla/DeepSpeech

Железо

- Nvidia Titan V - пускайте слюни за US$3k - www.youtube.com/watch?time_continue=43&v=NPrfiOldKf8

#digest

#data_science

#deep_learning

snakers4 (Alexander), December 05, 05:13

Интересное в мире интернета

- Ben Evans - goo.gl/HYMvh4

- Чуваки которые хотели отсудить Фейсбук имеют US$1bn в битках - goo.gl/EQcu6U

- Цены на Li-ion батареи продолжают падать линейно - goo.gl/TTyBVV

- Cydia закрывается - goo.gl/ik8STV - Apple победил. В ведерке вы можете ставить внешние говно-приложения на свой страх и риск просто нажав галку в меню. Про извращения, которые вы можете сделать подключившись со своего телефона к нему же в консоли вообще молчу

- Tunnel vision в Твиттере - goo.gl/NTG5ua - полярные мнения не пересекаются. Это разные миры. Еще один повод не участвовать в политике и тупых срачах

- В Кении к интернету доступ есть у 53% людей, 99% с мобилки - goo.gl/8P7D9H

- Cringe и маразм западного мира - Facebook for kids - goo.gl/smcyyL . Бред состоит в том, что буквально 50 лет в некоторых штатах назад даже в США дети рассматривались как бесплатная рабочая сила (а что семьи большие, пусть работают). А сейчас прививается этот маразм, что детство "нельзя трогать". Особенно это смешно, вспоминая школьные классы которые состояли от 5% до 50% из гопников (чем дальше тем их было меньше)

- Инфографика про посылки через интернет - Китай растет и уже на 2 месте после США - goo.gl/ZLDJVF

- Автономные машины могут помочь бедным - goo.gl/rjUDcp

- Кто в танке - рост битка и его сравнение с другими активами - goo.gl/WQ3kKY

- Uber занимался ... слежкой за людьми лол - goo.gl/GHW9qC

- Журналистота, но по гео-координатам можно находить факт наличия серийных убийц - goo.gl/FiVp93

- Как дети юзают интернет - goo.gl/BaZWxa

#internet

#digest

Winklevoss Twins Used Facebook Payout to Become Bitcoin Billionaires

In just four years.


snakers4 (Alexander), November 28, 13:58

Интересное в мире ML

- Еще одно, поверхностное, объяснения КапсНета - goo.gl/hYvZZV

- Датасет с 11к рук + метаданные - goo.gl/YVfPvi

- ONNX (типа единый формат для моделей) это уже часть Pytorch - pytorch.org/docs/master/onnx.html

- Nature запускает свой ML журнал - goo.gl/arzNg3

- Apple делает что-то с беспилотными авто - goo.gl/nMqzJ1

- Гугл снижает цены на топовые GPU прошлого поколения - goo.gl/4oM5wd

- Очень красивая но бесполезная практика с методами уменьшения размерности - goo.gl/UDgmUT

- Еще один алгоритм в копилку алгоритмов снижения размерности

-- PCA

-- T-SNE

-- UMAP - goo.gl/jFSBFZ

-- HDBscan - github.com/scikit-learn-contrib/hdbscan

- Fchollet про буллшит и general AI - goo.gl/zai717

" The intelligence of an octopus is specialized in the problem of being an octopus. The intelligence of a human is specialized in the problem of being human."

- Advisory Body про AI в UK - goo.gl/yZNBfD

- Офигенный пост про анализ комментов про отмену net neutrality - goo.gl/3ZahMM (!)

- Еще раз наткнулся на либу для NLP spacy - кто пробовал, норм? - spacy.io

#data_science

#digest

Capsule Networks (CapsNets) – Tutorial

CapsNets are a hot new architecture for neural networks, invented by Geoffrey Hinton, one of the godfathers of deep learning. NIPS 2017 Paper: * Dynamic Rout...