Do We Have To Sneak Audit Site Hosts Now?

For retail IT directors, the end of American Eagle Outfitters' 8-day E-Commerce collapse just marks the start of a new fear: that they'll have to begin dispatching staffers to do sneak inspections of outsourcers. Will they need to burn precious staff time in unannounced audits, looking over the shoulders of service providers to make sure those techs are doing their jobs?

Will they eventually have to turn to a whole new class of outsourcers who do nothing but check up on the big outsourced teams? And who will watch those watchers?

Or is this an overreaction to a disastrous but highly unusual event? A wholesale failure like the American Eagle debacle is big news because it's so rare. Datacenter disasters happen. There's no way to bring the risk to zero. And beyond making sure good practices are being followed, returns diminish quickly--you can suddenly find you're spending a lot of money to prevent something that almost never happens.

Or will the American Eagle fiasco wake up vendors to make retail IT directors' worries unnecessary? OK, that one we can answer: No. Like most IT disasters, this event is not so much an alarm for the industry. It's more like a snooze alarm: a little noise, then back to sleep.

But as readers commented on Friday's story about Oracle's and IBM's role in the blowup, something radical has to at least be considered. It has to happen for no other reason than to be able to say alternatives were debated, when the COO asks how the company can be guaranteed it won't go dark for eight days.

(Our PCI Columnist, Walt Conway, penned a related column about the PCI implications of data backups.)

One executive with a vendor in the backup space agreed that something like unannounced on-site inspections need to happen. But that exec, who quickly decided to seek anonymity, stepped back from his suggestion when asked how his company would logistically handle such unannounced sneak inspections.

It's an attractive idea—almost as attractive as outsourcing is.

The problem, of course, is a very practical one: You outsource to save time and money. You want to offload work that someone else can do with more expertise or less expense. And you'd really like not to have to worry about that work getting done--and done correctly. In the best of all worlds, you want to outsource it and forget it.

But that's not possible. That's a little too much trust to place in someone who doesn't exactly have your best interests at heart. True, an outsourcer wants to keep your business. That means keeping you happy as a customer. But when you send operations outside your IT shop, they'll be performed by employees you didn't hire on equipment you don't maintain, and they'll be supervised by managers whose bonuses depend on keeping costs down.

Controlling costs is a priority for you, too. But it's the priority for them. Their jobs don't depend on your business' success. Yours does.

That's why you can't afford to trust too much in an outsourcer's plans and promises. You have to verify, too.After all, what happened to American Eagle should have been impossible. The retailer and IBM, its hosting provider, had the right plan for dealing with an outage. Disk drive fails? Storage array recovers automatically. Second drive in the array fails? A quick restore brings the data back. Complete failure at the main datacenter? Switch to the backup site. The plan should have been bulletproof. Instead, it was a failure at every level, and American Eagle was crippled for more than a week.

Should American Eagle have sent a steady stream of IT employees to constantly check on IBM's hosting work--or, better still, contracted an elite SWAT team to spy on Big Blue? That team probably would have spotted the faulty backups and the fact that they weren't being routinely verified. It might have noted more subtle problems, like an aging storage array or a datacenter running just a bit too warm.

That approach would also have cost a small fortune, wiping out the cost advantages of outsourcing. It probably would have made American Eagle the kind of customer that just isn't worth keeping. It would have been overkill.

But overkill isn't necessary. Just a little more attention to how an outsourcer is doing--at only a little greater cost--can work wonders.

That means monitoring service levels to confirm they meet the SLA. It means making sure you have a clause in your contract that lets you audit, so you can confirm that backups are being made and they work, that datacenter operations appear to be crisp and professional, and that important plans (such as American Eagle's planned-but-not-ready backup site) aren't shelved. And it means actually doing those audits; not constantly, but often enough to make sure you're getting what you've paid for--and often enough to make an impression.

Such attention does more than just identify disasters-in-the-making. It also gooses providers just enough to remind them they can't cut too many corners or become complacent. It shows them you're serious and can't be taken for granted, but without implying that the outsourcer is incompetent and completely untrustworthy.

Remember, you don't really want to find a mess when you check on an outsourcer. You want to find that everything is working the way it should. That may mean a little more expense and a little less trust than you'd like.

But in the wake of American Eagle's crash, it's a tradeoff you can't afford not to make.