E-commerce companies rely on Big Data to glean valuable, real-time insights that drive smarter, more profitable business decisions. Problems can arise, however, when Big Data infrastructures become riddled with bottlenecks and don’t perform optimally, causing critical intelligence to be delayed or unavailable.
Some e-commerce companies have attempted to “scale away” these inefficiencies by adding hardware. Yet in addition to the up-front system costs, bigger clusters require more expensive manpower and can make performance optimization even harder by introducing more complexity and more points of failure.
In this article, I’ll examine exactly what “Big Data” means; how e-commerce companies use Big Data; why scalability is not always the answer to infrastructure performance issues; and how application performance management techniques help to ensure timely and cost efficient execution of jobs and simplify IT operations.
What Exactly Is ‘Big Data’?
Webopedia defines Big Data as “a massive volume of both unstructured and structured data so large that it’s difficult to process using traditional database and software techniques.” The “unstructured” part of that definition encompasses things like email, video, tweets and Facebook “likes” — data that doesn’t reside in a database that’s accessible to merchants, but is nonetheless very useful.
Structured data, on the other hand, generally refers to databases where specific information is stored based on a methodology of columns and rows. For e-commerce merchants, this could be customer data like name, address and ZIP code.
How Do E-Commerce Companies Use Big Data?
Both large and small e-commerce companies are driving favorable business outcomes from Big Data in a variety of ways. Consider the following scenarios:
Personalization: Consumers shop with the same retailer via different channels, and data from these multiple touch points can be processed in real-time to offer the shopper a personalized experience, including special content and promotions.
Dynamic Pricing: E-commerce companies need dynamic pricing if their products are to compete on price with other sites. This requires taking data from multiple sources — such as competitor pricing, product sales, regional preferences and customer actions — to determine the right price to close the sale.
Customer Service: Excellent customer service is critical to the success of an e-commerce site. If a customer has complained via the contact form on your online store and also tweeted about it, it will be good to have this background when he calls customer service. This will make the customer feel valued.
Supply Chain Visibility: Customers expect to know the exact availability, status and location of their orders. This can get complicated for e-commerce companies if multiple third parties, such as warehousing and transportation providers, are involved in the supply chain. E-commerce companies must be able to quickly gather information from multiple parties on multiple products in order to accurately convey expected delivery timetables to customers.
Predictive Analytics: Big Data can help e-commerce companies identify events before they occur. This is called “predictive analytics.” A good example is identifying sales patterns from previous timeframes to better predict and manage inventory needs and avoid key out-of-stock products in the next go-around.
The common thread running through these use cases is the need for intelligence and information in as close to real-time as possible. In many cases, intelligence gleaned from Big Data is used to support real-time interactions with customers. For e-commerce companies, the turnaround time for Big Data-driven information requests needs to be measured in minutes or even seconds rather than days.
Scalability Is Not Always the Answer
Given the sheer volume of information to deal with, companies often focus heavily on the scalability of their Big Data environments. Yet while Big Data technologies scale and more hardware can often be added, that doesn’t necessarily mean users are getting the performance they need to deliver up-to-the-minute information.
As a result, they may be missing out on critical opportunities for value creation — for example, pricing a product most effectively during a fleeting, high-potential sales period.
By performance, I mean maximizing the efficiency and speed of your existing resources. Never before have organizations had to deal with so much data, generated from so many different sources, at such a dizzying pace. The velocity of data, or the rate at which information is “gulped” in Big Data systems, is mind-boggling.
The fact is that Big Data applications and environments suffer from many of the performance challenges and bottlenecks that plague current distributed applications, putting ROI from Big Data projects at risk.
Adding more servers does not always adequately address the challenge; in fact, it may only exacerbate problems. In many cases, it’s not a lack of resources making a job slow but an organization’s inability to quickly and easily identify and fix performance bottlenecks. This is where application performance management techniques help.
APM: Unlocking the Potential of Big Data
Now is the time for IT executives to demonstrate to their leaders the ongoing relevance and business value of Big Data and the fact that it does more than just eat away at IT departments and budgets.
APM solutions have traditionally been used in production environments to proactively identify poorly performing applications that may stand in the way of yielding critical intelligence in as close to real-time as possible — as well as the underlying root cause.
Today, APM solutions are being applied to Big Data environments as well. They can trace individual transactions across complex IT infrastructures and quickly identify hotspots, making Big Data jobs more efficient and ensuring that user requests are answered in a timely manner.
Poorly performing Big Data applications and environments can affect revenues and customer satisfaction, and they might delay or halt business analytics. For example, is the issue with a particular Hadoop MapReduce job? Data access or distribution? Bad code? Server or hardware? Network? Or the application itself?
As noted above, trying to scale away the problem by adding more nodes, clusters or hardware can often be extensive and futile. Ensuring the performance and availability of Big Data applications to meet business demands and satisfy end users creates a niche for APM.
Approaches to APM can offer deep insight into systems in order to optimize computation and data distribution across nodes, assure efficient job execution, identify I/O bottlenecks and tune CPU and memory consumption among thousands of nodes.
The Bottom Line
Most e-commerce companies are not the breed of organizations that purely rely on Big Data for post-mortem analytics, when it’s too late to make a difference in a particular sale. Rather, they rely on Big Data to create value in real time and enable them to correct courses of action quickly.
Scalability, while useful in some instances, is not always the golden bullet for getting the best performance and most timely information out of your existing Big Data infrastructure. Rather, the right performance analysis and optimization techniques, enabled through APM, can empower e-commerce companies to produce better results quickly and efficiently with the environments they have today.
Michael, nice article on Big Data. Designed by data scientists, HPCC Systems is an open source data-intensive supercomputing platform to process and solve Big Data analytical problems. It is a mature platform and provides for a data delivery engine together with a data transformation and linking system. The real-time delivery of data queries of the Roxie component is a big advantage for marketers needing to take action from data insights that can open doors for more opportunities. More info at hpccsystems.com.