We had a hard drive failure at our data center yesterday - here's what we did about it.
Miami's a beautiful place. There's sun and sea, dolphins and Disney World. It's also the place where we have our data center, a rack of hard drives nestling among hundreds of others, all cared for by our hosting provider.
Usually all of these drives function perfectly well, but occasionally, as with any piece of technology, things go wrong. Fortunately we get notification of this, and yesterday our hosts told us that one of our drives had suddenly and unexpectedly failed and that they'd be replacing it. This caused severe degradation of performance across the entire site and all our tools.
Once the disk drive had been replaced, it took a few hours for it to catch up with the other drives in the array (half a billion keywords don't just synchronize in just a few minutes ...), and obviously some of our Keywords Tool, Link Builder and API customers were affected by this. We had quite a few messages in our support network which fortunately we were able to respond to quickly, and a few Twitter posts, which again were acknowledged quickly.
The drive had caught up with the rest of the system by around 01:30am GMT and things got back to normal so that users could get back to the serious business of keyword research.
We've also managed to add some time to the Keywords tool accounts of all the people that have contacted us about this - if you were affected but didn't get in touch, please let us know and we can do the same for you.
In terms of the future, we're improving our failsafes so that if something like this comes up again we can deal with it more quickly.
What do you do when things go wrong for you?