Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1812 members, 1759 posts since 2016

All this - lost like tears in rain.

Data science, ML, a bit of philosophy and math. No bs.

Our website
Our chat
DS courses review

Posts by tag «hardware»:

snakers4 (Alexander), July 12, 05:25

Installing apex ... in style )

Sometimes you just need to try fp16 training (GANs, large networks, rare cases).

There is no better way to do this than use Nvidia's APEX library.

Luckily - they have very nice examples:


Well ... it installs on a clean machine, but I want my environment to work with this always)

So, I ploughed through all the conda / environment setup mumbo-jumbo and created a version of our deep-learning / ds dockerfile, but now instlalling from pytorch image (pytorch GPU / CUDA / CUDNN + APEX).


It was kind of painful, because PyTorch images already contain conda / pip and it was not apparent at first, causing all sorts of problems with my miniconda instalation.

So use it and please report if it is still buggy.




A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch - NVIDIA/apex

Logging your hardware, with logs, charts and alers - in style

TLDR - we have been looking for THE software to do this easily, with charts / alerts / easy install.

We found prometheus. Configuring alerts was a bit of a problem, but enjoy:





Contribute to snakers4/gpu-box-setup development by creating an account on GitHub.

snakers4 (Alexander), May 02, 05:41

Poor man's computing cluster

So, when I last checked, Amazon's p3.4xlarge instances cost around US$12 per hour (unless you reserve them for a year). A tower supercomputer from Nvidia costs probably US$40-50k or more (it was announced at around US$69k).

It is not difficult to crunch the numbers and see, that 1 month of renting such a machine would cost at least US$8-10k. Also there will the additional cost / problem of actually storing your large datasets. When I last used Amazon - their cheap storage was sloooooow, and fast storage was prohibitively expensive.

So, why I am saying this?

Let's assume (according to my miner friends' experience) - that consumer Nvidia GPUs can work 2-3 years non-stop given proper cooling and care (test before buying!). Also let's assume that 4xTesla V100 is roughly the same as 7-8 * 1080Ti.

Yeah, I know that you will point out at least one reason why this does not hold, but for practical purposes this is fine (yes, I know that Teslas have some cool features like Nvlink).

Now let me drop the ball - modern professional motherboards often boast 2-3 Ethernet ports. And sometimes you can even get 2x10Gbit/s ports (!!!).

It means, that you actually can connect at least 2 (or maybe you can daisy chain them?) machines into a computing cluster.

Now let's crunch the numbers

According to quotes I collected through the years, you can build a cluster roughly equivalent to Amazon's p3.4xlarge for US$10k (but with storage!) with used GPUs (miners sell them like crazy now). If you buy second market drives, motherboards, CPUs and processors you can lower the cost to US$5k or less.

So, a cluster, that would serve you at least one year (if you test everything properly and take care of it) costing US$10k is roughly equivalent to:

- 20-25% of DGX desktop;

- 1 month of renting on Amazon;

Assuming that all the hardware will just break in a year:

- It is 4-5x cheaper than buying from Nvidia;

- It is 10x cheaper than renting;

If you buy everything used, then it is 10x and 20x cheaper!

I would buy that for a dollar!

Ofc you have to invest your free time.

See my calculations here:




config Server,Part,Approx quote,Quote date,Price, USD,Comment,RUR/USD,65,Yes, I know that you should have historical exchange rates 1,Thermaltake Core X9 Black,12,220,11/22/2018,188 1,Gigabyte X399 AORUS XTREMESocket TR4, AMD X399, 8xDDR-4, 7.1CH, 2x1000 Мбит/с, 10000 Мбит/с, Wi-Fi, Bluetooth, U...

snakers4 (Alexander), January 12, 2018

Now I know how to make my remote learning rig perfect - add a hardware reboot watchdog


USB WatchDogs — небольшой обзор

Сегодня я расскажу о паре устройств, предназначенных для аварийной перезагрузки зависших компьютеров — аппаратных вотчдогах. Если вам интересно, что это и как подключать — прошу под кат…

snakers4 (Alexander), December 27, 2017

Если вы сейчас собираете себе или компании железо для нейросеток, то не только статья про GPU Limbo, но и эта статься про ответ Intel на новые линейки от AMD вам будет интересна



Понятно, что процессор это не боттлнек, но все равно интересно как конкуренция влияет на рынок.


Обзор процессора Core i9-7900X: предвестник ядерной войны

Приобрести новый десятиядерный HEDT-процессор Core i9-7900X поколения Skylake-X готовые потратить $1 000 энтузиасты смогут уже через неделю. В преддверии начала продаж мы подробно протестировали новинку и теперь попытаемся объяснить, в чём помимо цены она может быть интереснее прошлого интеловского десятиядерного процессора, Core i7-6950X

snakers4 (Alexander), November 12, 2017

Про апгрейд на NVME диск




NVMe дома, ня

Всем привет ! Постоянные читатели моего блога наверняка могли заметить, что я могу быть довольно консервативен в плане апгрейдов компьютерного железа, а сменить систему почти 10-и летней давности м…

snakers4 (Alexander), September 08, 2017

Если вы ищете мощные GPU-accelerated сервера в облаке, то вот вариант


Я сам пробовал Floydhub, но мне очень не понравилось. С другой стороны тут ценник гораздо более кусучий чем на Амазоне.


Prisma Cloud - a perfect solution for neural networks hosting | Products & services |

The most productive GPU servers for deep learning and neural networks

older first