Tracking your hardware ... for data science
For a long time I though that if you really want to track all your servers' metrics you need Zabbix (which is very complicated).
A friend recommended me an amazing tool
It installs and runs literally in minutes.
If you want to auto-start it properly, there are even a bit older Ubuntu packages and systemd examples
Dockerized metric exporters for GPUs by Nvidia
It also features extensive alerting features, but they are very difficult to easily start, there being no minimal example
An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.