July 02, 07:34

New version of our open STT dataset - 0.5, now in beta

Please share and repost!

github.com/snakers4/open_stt/releases/tag/v0.5-beta

What is new?

- A new domain - radio (1000+ new hours);

- A larger YouTube dataset with 1000+ additional hours;

- A small (300 hours) YouTube dataset downloaded in maximum quality;

- Ground truth validation sets for YouTube / books / public calls manually annotated;

- Now we will start to focus on actually cleaning and distilling the dataset. We have published a second list of "bad" data;

I'm back from vacation)

#deep_learning

#data_science

#dataset

snakers4/open_stt

Russian open STT dataset. Contribute to snakers4/open_stt development by creating an account on GitHub.