March 04, 08:46

Tracking your hardware ... for data science

For a long time I though that if you really want to track all your servers' metrics you need Zabbix (which is very complicated).

A friend recommended me an amazing tool

- prometheus.io/docs/guides/node-exporter/

It installs and runs literally in minutes.

If you want to auto-start it properly, there are even a bit older Ubuntu packages and systemd examples

- github.com/prometheus/node_exporter/tree/master/examples/systemd

Dockerized metric exporters for GPUs by Nvidia

- github.com/NVIDIA/gpu-monitoring-tools/tree/master/exporters/prometheus-dcgm

It also features extensive alerting features, but they are very difficult to easily start, there being no minimal example

- prometheus.io/docs/alerting/overview/

- github.com/prometheus/docs/issues/581

#linux

Monitoring Linux host metrics with the Node Exporter | Prometheus

An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.