This can be useful for
- troubleshooting hardware/overheating issues
- Testing new hardware changes, etc
In this case i wanted to stress test my Dell PE 2950 after modding it to reduce the fan noise - link
Details
Model - Dell PE 2950
OS - ESXI 5.5
I cannot get a lot of info from the exsi health monitor and i cannot run any pakages from the esxi shell.
I needed an effective way to test if decreasing the fan speeds have any negative effects on the system under heavy load.
I used a Cent OS 6.5 installed on to an old 8GB USB thumb drive (not live linux. i have it plugged in to the usb port on the server and its just a matter of changing the boot priority in bios to get it up and running)
Sadly there isn't a lot of options for Linux so finally, I ended up running Prime 95 and a few other monitoring tools to perform the task.
Packages and programs used
Installing and usage of relevant packages
lm_sensors - link
Read sensory data such as – temperature of cpu cores and ram modules
sudo yum install lm_sensors
It is recommended to run the following command after installation
sudo sensors-detect #Carefully follow the prompts to configure the package
to start monitoring temperature
sensors
OpenIPMI-tools - link
yum install OpenIPMI OpenIPMI-tools chkconfig ipmi on service ipmi start # Usage Examples # To check firmware version ipmitool mc info # Show sensor output ipmitool sdr list ipmitool sdr type list ipmitool sdr type Temperature ipmitool sdr type Fan ipmitool sdr type 'Power Supply'
Prime 95 for Linux (Mprime) - link
This will max out all the cores on both sockets and test a lot of ram. This tool is one of my absolute favorites, when over clocking my gaming rig.
# download the package wget http://mersenneforum.org/gimps/mprime2511.tar.gz # Extract tar zxw mprime2511.tar.gz # run the program ./mprime
Follow the on screen prompts to initiate the test, I used the “blend test”
Test environment
Room Ambient temp – 78* (AC was switched off)
Windows open for good air flow
TEST
I ran the prime95 blend test for 60 minutes and It was maxing out all 8 cores and pushing the system to the limit. If it goes through this without a hick up, it should hold up under normal use with no problems.
I used the following commands with multiple ssh sessions
Monitor cpu usage
top
Process running time
watch ps -p "pid" -o etime= #"pid" - get the pid for prime95 from the “top“
Monitor the temperatures continuously
watch sensors
lm_sensors_output |
Results
I didnt collect any logs but rather monitored the ssh sessions for any issues, but im happy to say that cpus and the ram modules held up pretty well with the fan mod under a lot of stress
Max temperature recorded was 87* and it went down after the fans spooled up.
So there you go. Leave comment and lets us know if you can add anything or correct anything I will update the post.