Power spike brings business to a halt
- Written by Neville Matthews Neville Matthews
- Published: 15 February 2013 15 February 2013
Power spike causes Server to fail and why you MUST use a UPS for critical equipment!
On 20th December 2012 at about 5pm I was called by a client whose main server had failed. This for the client was a disaster because pretty much, no server means no business. The server would partially power on but could not see any hard drives so obviously it wouldn't start up.
Inside the machine we could see that an electrolytic capacitor had blown. Useful as this was it really didn't help because the client had to have a running system by 7AM the next day.
Luckily depending on how you view it, this same server had failed about three years previously. On this previous occasion we turned a workstation into a temporary server. The failure happened on a weekend this last time and a 500Gb hard drive was purchased from Maplin on Sunday and the temporary server was in place for start of business at 7 AM Monday. This was a Thursday so we did not have the luxury of a whole Sunday (the client is open 6 days a week).That hard drive had been put away for safe keeping and with the records that MCS keep we were able to determine which PC we had used last time, this saved a great deal of time for the recovery.
Tape Backup Saves Business should really be the title of this article because the hard drives were not accessible because we did not have an identical system to read the RAID set. The client takes a backup to hard disk at end of business every day however this failure happened before that time so no end of backup to restore from. At about 1 o'clock in the morning I took the previous nights full system tape backup to our offices and loaded into one of our machines. I was able to recover the data back to a hard drive for copying to the temporary server. This is a very nerve racking time because it is not really until you actually need the data back from tape that you find out the tapes you are loading diligently actually have any data on them.
The server was taken to a board level repairer because the machine is out of warranty and parts are no longer kept by the manufacturer plus I couldn't find one on Ebay. This is the crux of this article, I took the server and collected it from the repairer personally for speed and security. Whilst talking to the engineer (I wanted to see it boot!) that had actually fixed the machine. He said the most likely cause of the part failure and yes it was a single electrolytic that had failed was a power surge or spike.
I hadn't really thought about this before since power is mostly reliable and clients have always been resistant to buying them, now I know why you MUST run not just any old UPS but a true online UPS for your server and other critical equipment and this is the KEY THING, kit that is on all the time (real 24x7) are when you think about it obviously going to be the bits that take any type of fluctuation in the power.
A true online UPS is delivering clean power from batteries all the time guaranteeing solid stable power.
UPS - Uninterrupitble Power Supply