Down For 8 Days: American Eagle's Site Disaster

In one of the longest site outages ever for a multi-billion-dollar retailer, Tuesday (July 27) saw the apparent end of more than a week of Web problems and days of an outright crashed site for Pittsburgh-based clothing chain American Eagle Outfitters, which outsources much of its Web operations to IBM. The site crashed last Monday (July 19) and stayed dark until Friday (July 23), when it limped along with various parts not functioning until Tuesday afternoon (July 27).

The site's problems, though, shed light on an interesting strategy. During the many days of complete Web site death, the $2.7 billion apparel chain's mobile site was still up. But it apparently was not able to perform purchases. Officials at American Eagle Outfitters, IBM and Usablenet—which handles the chain's mobile site—wouldn't comment on the mobile site's functionality during the crash.


New Details About The Crash Causes: Oracle Backup The Culprit, Along With Big Blue

But this raises the question: Should retailers look to their mobile sites as emergency backups for their Web sites? Should pages indicating that a site is down automatically include a link to the site's mobile version?

Mobile sites, of course, work just as well on desktop machines as they do on phones. American Eagle Outfitters, which has the admirably short URL of ae.com, exists as a mobile site.

Before we dive into that mobile-as-site-backup issue, let's look at exactly what happened with American Eagle's site. None of the players involved would get specific as to what was wrong with the site, other than to say that there was no upgrade going on at the time and that the site experienced "a hardware issue."

A server failure almost certainly would not have caused this problem; redundant servers would likely have kicked in while the defective machine was replaced with a new server and a backup was restored. That process would have taken a few hours, not almost eight days.

This delay suggests some sort of storage problem. Say the storage array begins to fail. OK, no problem, we'll just find the bad drive and replace it. Whoops, looks like something has corrupted multiple drives. (That could happen if power gets flaky inside the array.) Now we have a catastrophic failure of the storage array. No problem, we'll just fix the hardware and restore.

Whoops, new problem: Turns out this problem has been going on for a while. The last set of backups is corrupted. So is the set of backups before that. Sorting through to reconstruct good data is going to take time.

Alternatively: All recent backup sets are toast. Maybe nobody was verifying that the data was actually being written. However, all the transactions are being logged. No problem, then: All it takes is a lot of time and special expertise to essentially rerun all the recent transactions (since the last good backup) into an empty database, merge the new stuff with the old stuff and then load it all back into the replacement hardware.

By the way, it seems that American Eagle was recently searching for a "Manager - Business Continuity & Disaster Recovery". The job was still an active posting on May 25 but has since been filled. Not a moment too soon, eh? (Thanks, Google cache!) Enough speculation and job postings. Let's look at the timeline. On Monday (July 19) in the early evening (New York time), the site fully crashed, according to Jani Strand, the chain's VP/corporate communications. She said it was "a hardware issue with the host of the site," which was—and presumably still is—IBM. "IBM is our partner and we're working with them to solve this. We're both very disappointed."

When asked about details of what brought the site down, Strand said "we're not going into that level of detail."

After the site crashed on Monday, a screen message told visitors: "Sorry. We need a few minutes to re-organize our closet. We promise to be back in a bit with even more" and then added the logos of the chain's three key brands: American Eagle Outfitters, Aerie and 77kids.

(Given that the site wouldn't be fully back up for eight days, the site must use the same definition for "a few minutes" that my 12-year-old daughter uses when she offers a time estimate for when she'll wash the dishes.)

That sign was up through Wednesday (July 21). On Thursday (July 22), the sign changed to "We're making updates to our sites. Free Shipping on us when we're back, thru July 25." Note: The site didn't fully return until July 27.

Back to the timeline, the site came back up late Thursday morning (July 22): Strand put the time at "about 11:15 AM" and added that the site at that time had "a little bit of limited functionality. Some of the saved data hasn't been restored, but it is shoppable now."

By Friday morning (July 23), the site's message had changed again, offering some details on the limited functionality. "We're still working through some issues, but you're able to shop! Everything should be completely fixed very soon. Thanks for hanging in there. Stuff we're still working on: Order tracking, Registered Information Functionality, Wish List, Order History."

Screens through Tuesday (July 27) thanked consumers for "hanging in there while we work through some site issues." On Tuesday afternoon, the warning messages came down and IBM said the site "is now fully operational."

Although this outage was much worse than retailers typically suffer through, even a several-hour outage poses the potential for losing customers. As such, does it make sense to look at your mobile site as an emergency backup to your Web site?

A few issues need to be considered before you make that leap: Many mobile sites leverage the content and the database of the main site. That would mean a mobile site would only be helpful if the main databases of the site—pricing, inventory, order placement, etc.—are still functioning. Payment is often independent, so there's a fine chance that system may survive. If it's merely a hosting server that has crashed, the mobile site could be a powerful backup.

In American Eagle's case, this approach would not have worked because of the infrastructure they chose. But if the idea of using mobile as a backup appeals to an IT team, then this should be added to the criteria of a mobile vendor, platform and architecture choice.

The American Eagle Usablenet site was designed as a proxy between the mobile device and the American Eagle Web site, which made mobile backup impossible. "The search page on the mobile device can be served by Usablenet only if the search page on the main web site is working," said one official who was both involved in the arrangement and is also fond of her anonymity.

Usablenet and other proxy services would not support a mobile failover strategy, but some of the other major mobile outfits would. "If (American Eagle) was using a host like Digby, which is architected to run an application framework independent from the company’s main site, and relies only on an inventory data feed and some backend web services, then they could have operated the mobile site autonomously albeit with some limited functionality," said the source, who works in IT and is not involved with Digby.

"The underlying message is that if autonomy from mobile to main site is a need, the former should be architected for that purpose and, if a hosting provider is used, then it is critical to choose a host that utilized a framework that make this possible. Some do, but Usablenet by definition will not," said the IT person.

Before deciding architecture, a company has to decide if using mobile as a Web site backup is even an idea that works for them. This approach might require a mobile site to house duplicate mirrored versions of the key databases, explicitly so that that site could more likely survive a main Web site outage?

Even if it does survive, there are other concerns. Despite ongoing signs that mobile is growing with staggering speed, some mobile sites are not given the infrastructure to support enterprise-level Web traffic. Unless the sites are redesigned to handle much more bandwidth, pointing your Web traffic to your existing mobile site may cause a mobile crash.

Another concern, albeit much less significant, is design. Mobile sites are deliberately kept utilitarian. That look may not appeal to some marketers, who could argue that "no site" is better than a "bland site." Perhaps, but we're guessing the CFO may disagree.

Suggested Articles

Costco changes up its menu items, and Alibaba and Guess partner for a physical store.

Janey Whiteside, Walmart's new chief customer officer, is well acquainted with the importance of customer service in modern retail.

Whole Foods will offer deals on Amazon's Prime Day, and tariffs against China are causing pricing hikes.