Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1812 members, 1759 posts since 2016

All this - lost like tears in rain.

Data science, ML, a bit of philosophy and math. No bs.

Our website
- http://spark-in.me
Our chat
- https://t.me/joinchat/Bv9tjkH9JHbxiV5hr91a0w
DS courses review
- http://goo.gl/5VGU5A
- https://goo.gl/YzVUKf

Posts by tag «hardware»:

snakers4 (Alexander), July 12, 05:25

Installing apex ... in style )

Sometimes you just need to try fp16 training (GANs, large networks, rare cases).

There is no better way to do this than use Nvidia's APEX library.

Luckily - they have very nice examples:

- github.com/NVIDIA/apex/tree/master/examples/docker

Well ... it installs on a clean machine, but I want my environment to work with this always)

So, I ploughed through all the conda / environment setup mumbo-jumbo and created a version of our deep-learning / ds dockerfile, but now instlalling from pytorch image (pytorch GPU / CUDA / CUDNN + APEX).

- github.com/snakers4/gpu-box-setup/blob/master/dockerfile/Dockerfile_apex

It was kind of painful, because PyTorch images already contain conda / pip and it was not apparent at first, causing all sorts of problems with my miniconda instalation.

So use it and please report if it is still buggy.

#deep_learning

#pytorch

NVIDIA/apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch - NVIDIA/apex


Logging your hardware, with logs, charts and alers - in style

TLDR - we have been looking for THE software to do this easily, with charts / alerts / easy install.

We found prometheus. Configuring alerts was a bit of a problem, but enjoy:

- github.com/snakers4/gpu-box-setup#prometheus

#deep_learning

#hardware

snakers4/gpu-box-setup

Contribute to snakers4/gpu-box-setup development by creating an account on GitHub.


snakers4 (Alexander), May 02, 05:41

Poor man's computing cluster

So, when I last checked, Amazon's p3.4xlarge instances cost around US$12 per hour (unless you reserve them for a year). A tower supercomputer from Nvidia costs probably US$40-50k or more (it was announced at around US$69k).

It is not difficult to crunch the numbers and see, that 1 month of renting such a machine would cost at least US$8-10k. Also there will the additional cost / problem of actually storing your large datasets. When I last used Amazon - their cheap storage was sloooooow, and fast storage was prohibitively expensive.

So, why I am saying this?

Let's assume (according to my miner friends' experience) - that consumer Nvidia GPUs can work 2-3 years non-stop given proper cooling and care (test before buying!). Also let's assume that 4xTesla V100 is roughly the same as 7-8 * 1080Ti.

Yeah, I know that you will point out at least one reason why this does not hold, but for practical purposes this is fine (yes, I know that Teslas have some cool features like Nvlink).

Now let me drop the ball - modern professional motherboards often boast 2-3 Ethernet ports. And sometimes you can even get 2x10Gbit/s ports (!!!).

It means, that you actually can connect at least 2 (or maybe you can daisy chain them?) machines into a computing cluster.

Now let's crunch the numbers

According to quotes I collected through the years, you can build a cluster roughly equivalent to Amazon's p3.4xlarge for US$10k (but with storage!) with used GPUs (miners sell them like crazy now). If you buy second market drives, motherboards, CPUs and processors you can lower the cost to US$5k or less.

So, a cluster, that would serve you at least one year (if you test everything properly and take care of it) costing US$10k is roughly equivalent to:

- 20-25% of DGX desktop;

- 1 month of renting on Amazon;

Assuming that all the hardware will just break in a year:

- It is 4-5x cheaper than buying from Nvidia;

- It is 10x cheaper than renting;

If you buy everything used, then it is 10x and 20x cheaper!

I would buy that for a dollar!

Ofc you have to invest your free time.

See my calculations here:

bit.ly/spark00001

#deep_learning

#hardware

computing_cluster

config Server,Part,Approx quote,Quote date,Price, USD,Comment,RUR/USD,65,Yes, I know that you should have historical exchange rates 1,Thermaltake Core X9 Black,12,220,11/22/2018,188 1,Gigabyte X399 AORUS XTREMESocket TR4, AMD X399, 8xDDR-4, 7.1CH, 2x1000 Мбит/с, 10000 Мбит/с, Wi-Fi, Bluetooth, U...


snakers4 (Alexander), January 12, 2018

Now I know how to make my remote learning rig perfect - add a hardware reboot watchdog

aminux.wordpress.com/2018/01/12/usb-watchdogs-opendev-vs-bitroleum/

#hardware

USB WatchDogs — небольшой обзор

Сегодня я расскажу о паре устройств, предназначенных для аварийной перезагрузки зависших компьютеров — аппаратных вотчдогах. Если вам интересно, что это и как подключать — прошу под кат…


snakers4 (Alexander), December 27, 2017

Если вы сейчас собираете себе или компании железо для нейросеток, то не только статья про GPU Limbo, но и эта статься про ответ Intel на новые линейки от AMD вам будет интересна

- 3dnews.ru/954174

- timdettmers.com/2017/12/21/deep-learning-hardware-limbo/#more-627

Понятно, что процессор это не боттлнек, но все равно интересно как конкуренция влияет на рынок.

#hardware

Обзор процессора Core i9-7900X: предвестник ядерной войны

Приобрести новый десятиядерный HEDT-процессор Core i9-7900X поколения Skylake-X готовые потратить $1 000 энтузиасты смогут уже через неделю. В преддверии начала продаж мы подробно протестировали новинку и теперь попытаемся объяснить, в чём помимо цены она может быть интереснее прошлого интеловского десятиядерного процессора, Core i7-6950X


snakers4 (Alexander), November 12, 2017

Про апгрейд на NVME диск

- aminux.wordpress.com/2017/11/11/plextor-nvme-ssd-home/

#linux

#hardware

NVMe дома, ня

Всем привет ! Постоянные читатели моего блога наверняка могли заметить, что я могу быть довольно консервативен в плане апгрейдов компьютерного железа, а сменить систему почти 10-и летней давности м…


snakers4 (Alexander), September 08, 2017

Если вы ищете мощные GPU-accelerated сервера в облаке, то вот вариант

- www.servers.com/prisma_cloud

Я сам пробовал Floydhub, но мне очень не понравилось. С другой стороны тут ценник гораздо более кусучий чем на Амазоне.

#hardware

Prisma Cloud - a perfect solution for neural networks hosting | Products & services | Servers.com

The most productive GPU servers for deep learning and neural networks


older first