Home > On the Ground > When Things Go Wrong…

When Things Go Wrong…

January 31st, 2009

Do you have a backup and recovery plan in place for your data, that you can turn to when the inevitable (catastrophic system failure and possible data loss) happens? Do you have a revision control system in place for your source code, that you can turn to when you break something in the functionality of your system?

Yesterday morning, a web-based bookmarking service, ma.gnolia, suffered a catastrophic failure. It will be at least a matter of days before the service returns, and users’ bookmarks may be lost, either partially or completely. Michael Calore writes at Wired’s Epicenter blog:

In light of today’s outage, many are questioning the reliability of web apps and web-based storage in general. Twitter in particular is full of users venting their suspicions.

“Cloud computing becomes fog when it goes down,” says Todd Spragins in a Twitter post.

Another common thread: People are talking about bailing on Ma.gnolia in favor of competitor Delicious.

More ammunition for the critics (warning: NSFW) of “the cloud”, or web-based software and data storage.

This morning, the mighty Google screwed up and, well, “broke the Interwebs”.  For a “brief” period (less than an hour), every single result returned from Google’s search was flagged as potentially “harmful”, meaning the linked site was “known” to install malware upon visiting.  Google VP Marissa Mayer writes:

What happened? Very simply, human error. … We maintain a list of such sites through both manual and automated methods … We periodically update that list and released one such update to the site this morning. Unfortunately (and here’s the human error), the URL of ‘/’ was mistakenly checked in as a value to the file[,] and ‘/’ expands to all URLs. Fortunately, our on-call site reliability team found the problem quickly and reverted the file.

If you click on a link that’s been marked by Google in this way, you don’t go directly to the linked page.  You’re taken to a page from Google asking, and I’m paraphrasing here, “Are you sure you want to visit this page? If so, you do so at your own risk.”  You can then click through to the actual page if you wish, against the advice of Google and their partner in this flagging endeavor, StopBadware.org (who Google initially suggested was at fault for this SNAFU — they’re not).

Needless to say, this essentially ground to a halt pretty much all web browsing linked from Google search results.  Of course, as Leo Laporte brought up on his radio show today, whether Google should be flagging sites and setting up roadblocks to visiting them, based on a computer- and human-generated blacklist, without your permission, is another question entirely.  Leo thinks you should be able to turn off this flagging feature in Google’s settings, and that Google shouldn’t be the supreme arbiter of what we get to see on the web.

Ms. Mayer writes that the problem was taken care of “quickly”, and Leo disputes this as well (“An hour?! That’s an eternity on the Internet!” — again, paraphrasing), but this at least illustrates that they have a plan in place for rolling back to earlier revisions of critical components of their system, as quickly as possible, in the event of a breakage.

Do you?

Categories: On the Ground Tags:
Comments are closed.