Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1810 members, 1762 posts since 2016

All this - lost like tears in rain.

Data science, ML, a bit of philosophy and math. No bs.

Our website
- spark-in.me
Our chat
- t.me/joinchat/Bv9tjkH9JHbxiV5hr91a0w
DS courses review
- goo.gl/5VGU5A
- goo.gl/YzVUKf

Posts by tag «dataset»:

snakers4 (Alexander), July 02, 07:34

New version of our open STT dataset - 0.5, now in beta

Please share and repost!

github.com/snakers4/open_stt/releases/tag/v0.5-beta

What is new?

- A new domain - radio (1000+ new hours);

- A larger YouTube dataset with 1000+ additional hours;

- A small (300 hours) YouTube dataset downloaded in maximum quality;

- Ground truth validation sets for YouTube / books / public calls manually annotated;

- Now we will start to focus on actually cleaning and distilling the dataset. We have published a second list of "bad" data;

I'm back from vacation)

#deep_learning

#data_science

#dataset

snakers4/open_stt

Russian open STT dataset. Contribute to snakers4/open_stt development by creating an account on GitHub.


snakers4 (Alexander), May 20, 06:21

New in our Open STT dataset

github.com/snakers4/open_stt#updates

- An mp3 version of the dataset;

- A torrent for mp3 dataset;

- A torrent for the original wav dataset;

- Benchmarks on the public dataset / files with "poor" annotation marked;

#deep_learning

#data_science

#dataset

snakers4/open_stt

Russian open STT dataset. Contribute to snakers4/open_stt development by creating an account on GitHub.


snakers4 (Alexander), May 09, 11:28

Habr.com / TowardsDataScience post for our dataset

In addition to a github release and a medium post, we also made habr.com post:

- habr.com/ru/post/450760/

Also our post was accepted to an editor's pick part of TDS:

- bit.ly/ru_open_stt

Share / give us a star / clap if you have not already!

Original release

github.com/snakers4/open_stt/

#deep_learning

#data_science

#dataset

Огромный открытый датасет русской речи

Специалистам по распознаванию речи давно не хватало большого открытого корпуса устной русской речи, поэтому только крупные компании могли позволить себе занима...


snakers4 (Alexander), February 16, 2018

New datasets

(1) HDR Dataset from Google

3,640 bursts of full-resolution raw images, made up of 28,461 individual images, along with HDR+ intermediate and final results for comparison

research.googleblog.com/2018/02/introducing-hdr-burst-photography.html

(2) Huge Anime dataset - 2.9m+ images annotated with 77.5m+ tags - www.gwern.net/Danbooru2017

#datasets

Introducing the HDR+ Burst Photography Dataset

Posted by Sam Hasinoff, Software Engineer, Machine Perception Burst photography is the key idea underlying the HDR+ software on Google's...


snakers4 (Alexander), January 12, 2018

Interesting datasets from Kaggle

Predict breast cancer from slide images

goo.gl/rDxrpZ

High quality academic dataset of 26k images of 41 fruits

goo.gl/JLWvLD

Gorgeous illustration of different network algorithms

goo.gl/z7oori

Crowd-sourced translation of parallel sentence pairs

goo.gl/7ky8Vw

5 years of hourly weather data for 36 cities

goo.gl/jjkRSq

#data_science

#datasets

Breast Histopathology Images

IDC vs non-IDC classification


snakers4 (Alexander), January 02, 2018

Interesting dataset with room layouts (a lot of them)

- lsun.cs.princeton.edu/2015.html

- lsun.cs.princeton.edu/2016/

#datasets

Pillow-SIMD is a Pillow fork, that claims 3-6x faster performance on CPU using same resources

- github.com/uploadcare/pillow-simd

- habrahabr.ru/post/301576/

It claims to be this easy

$ pip uninstall pillow
$ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd

#computer_vision

uploadcare/pillow-simd

The friendly PIL fork. Contribute to uploadcare/pillow-simd development by creating an account on GitHub.


habrahabr.ru/post/301576/

Pillow-SIMD

Ускорение операций в 2.5 раза по сравнению с Pillow и в 10 по сравнению с ImageMagick Pillow-SIMD — это «форк-последователь» библиотеки работы с изображениями...


snakers4 (Alexander), October 30, 2017

Современный dataset для image object detection

- detrac-db.rit.albany.edu

#datasets

older first