How to Test Windows Server Crashes, Hangs, and Kernel Memory Leaks
An often overlooked responsibility of any good systems administrator is testing as many aspects as possible of a production environment before there is an issue. These tests help ensure that everything is functioning as expected.
Here is a common list of tasks that should be tested on a regular basis:
- Restores – To ensure that your backups are working properly, periodic restores should be performed before they are needed
- Disaster Recovery Plan – Every key website or application should have a disaster recovery plan. This plan should be tested occasionally with production traffic to ensure that it will work effectively when needed
- High Availability Fail-overs – If you have a key website or application, you should have one or more devices along the path that are in High Availability (HA) mode. Testing this HA setup should be done regularly to confirm that HA works as expected
While those tasks are common sense, there is another task that should be tested at least once after your initial server setup; server crashes. Server builds occasionally make changes to how Windows crashes and where memory page files are located. Any of those changes can prevent Windows from creating dump files as expected.
Thankfully, SysInternals has created a great utility, NotMyFault, to help test numerous crashes, hangs, and kernel memory leaks. Installation is straight forward and this small package is very transportable.
By simply running the appropriate executable, you are presented with a small interface that offers many choices to purposefully crash or hang a server. You should test a variety of scenarios on the server to confirm that it will generate the appropriate dump files needed to find the root cause of a server crash.
Hopefully your servers never crash, but one ever does, you want as much data as possible to track down the real issue and prevent it from happening again.