I’ve been helping to troubleshoot the reason one Commodity Server (with no iDrac/Ilo ipmi) is powering off randomly. One of the hypothesis is the temperature.
This is a very simple script that will print the temperature of the HDDs and the CPU and keep to a log file.
First you need to install hddtemp and lm-sensors:
sudo apt install hddtemp lm-sensors
Then this is the one line script, that you should execute as root:
while [ true ]; do date | tee -a /var/log/hddtemp.log; hddtemp /dev/sda /dev/sdb /dev/sdc /dev/sdd | tee -a /var/log/hddtemp.log; date | tee -a /var/log/cputemp.log; sensors | tee -a /var/log/cputemp.log; sleep 2; done
Feel free to change sleep 2 for the number of seconds you want to wait, like sleep 10.
Press CTRL + C to interrupt the script at any time.
You can execute this inside a screen session and leave it running in Background.
Note that I use tee command, so the output is print to the screen and to the log file.