Who cares about Big Data? You should. All of a sudden, Web logs that were kept simply for troubleshooting purposes can now be mined to determine valuable information about customers' preferences.
Logs that are created by physical machines can now be analyzed en masse to look for information to help advance a business. Data from social networks can now be mined for customer sentiment. These problems were too big and too complex before. But now, answers are within reach.
The Big Data space is filled with so many posers, fakers and wannabes it's ridiculous. Everybody is trying to catch the Big Data wave by getting their name attached to this hot new trend. Let's start by all getting on the same page in terms of what I consider the definition of Big Data:
When someone in an organization gets the idea that they would like to pull useful information out of bunch of data that is so large and complex that the CIO says, "Well, how the hell are we going to do that?" That is Big Data.
What's new is that a bunch of really smart people have solved the two biggest challenges when it comes to analyzing data of large size and complexity. First, they have removed the need for giant, expensive, specialized hardware platforms (instead using a large number of small "commodity servers" or even cloud servers). And second, they have also removed the need to structure the data in a given format prior to running analysis.
These two technological advancements (and the dozens of other underlying technologies that support them) have unleashed a tremendous number of possibilities when it comes to gaining insight from information that would have previously required millions of dollars worth of hardware and a staff of data experts to process. In fact, with open-source software and Amazon EC2 virtual hardware, you can now process a job on 1,000 servers that, even if it lasted for two hours, would cost less than $200.
Being in the Big Data space, I try to read as much as I can that is published on the topic, and I find myself pleasantly surprised when I finish an article that doesn't leave me wanting to get those 3 to 5 minutes of my life back.
Unlike the "cloud" hype-cycle, where every company on the planet decided to start pitching the fact that it is, in fact, "in the cloud" (including helping the poor housewife catch her missed episode of TV), this insanity seems to be brought upon by writers and analysts who are covering the space but do not have the slightest idea what they are writing about. Cramming buzzwords into a story may help with search engine rank, but it sure does give me a headache.
Regardless of the software platform, the database and the data analysis components that may win or lose in this land grab, I see Big Data going through a three-phase evolution:
- Decisions Support. The first phase of Big Data will be as companies implement systems that create dashboards and reports that help them run their businesses. Big Data insights will help executives determine how to best steer their companies. This might be in the form of determining new product launches, existing product enhancements, potential market opportunities, etc.
- Marketing Optimization. The second phase of Big Data will be using data from purchase transactions, shopping transactions (online) and social networks to optimize marketing messages that retailers provide to their customers. In this phase, Big Data is used as a way to align the marketing organization with the customer base to a previously unseen level.
- Customer Optimization. The final phase of Big Data for retailers will be in creating systems that make traditional marketing a thing of the past. In this phase, retailers will use the data available to create personalized one-on-one marketing and sales messages for each of their clients. These algorithms will be self-tuning and will adjust to changing market and customer dynamics, optimizing customer acquisition and retention spending.
The biggest challenge is that Big Data is not a standalone project; it is a means to an end. If a retailer is trying to understand their customer sentiment, that sounds like it might be a Big Data project. If that same retailer is trying to create a marketing optimization platform, that sounds like it might be a Big Data project, too. And if the retailer wants to determine which potential market opportunity that the business should go after, that also sounds like it might be a Big Data project. Companies do not implement Big Data just so they can say that they have. (Let's face it, a lot of data warehouse and data mart projects were done for simply that reason.)
If you really want to get an idea about what Big Data is and where you should go from here, the first thing that I would recommend is to go to YouTube and search for "Hadoop," which is the leading open-source project for processing big data. I would then check out all of the great videos and training available at www.cloudera.com, one of the first movers in the Hadoop platform and consulting space. Do a Google search on "MapReduce" (the core processing component of Hadoop) and see how it enables you to process unstructured data. Also search for "No-SQL" and other column-oriented database technologies.
There are so many technical differences between how these platforms operate versus traditional data warehousing projects that you really have to spend the time to learn the underlying technologies. (It is pretty wild when you understand the difference between thinking about data in columns rather than in rows.) It will easily take you weeks, if not months, of focused effort to get through all the noise to a place where you can truly understand how these environments operate. Only then will you understand the potential of these platforms and what opportunities your business can explore.
What do you think? If you disagree (or even, heaven forbid, agree), please comment below or send me a private message. Or check out the Twitter discussion on @todd_michaud.