Enterprise IT

Online Retailers’ Critical Task: Cleaning Up Product Data

online shopper using retail e-commerce product discovery

Poor-quality product data routinely has severe implications for retailers. If left unresolved, bad data hinders the effectiveness of business operations, product search and discovery, customer satisfaction, and sales.

Bad product data, often hiding in plain sight, can critically impact retailers’ bottom lines. According to information technology firm Gartner, poor data quality costs organizations an average of US$12.9 million annually. It compounds the immediate impact on revenue in the long term. Besides increasing the complexity of data ecosystems, bad data leads to poor decision-making.

To make the impact of bad data on retailers more visible, SaaS-based e-commerce search and product discovery platform GroupBy hosted a webinar in September with Google Cloud partner Sada and e-commerce firm Rethink Retail. Titled “Bad Data, Big Trouble: How to Turn the Corner on Poor-Quality Product Data,” the event explored how businesses can use AI to enrich data, improve search relevancy and product discovery, boost customer satisfaction, reduce operational expenses, and increase revenue.

The key to this level of success is rooted in analyzing product data quality and identifying areas for improvement. Best practices include establishing a standard data collection model, conducting regular reviews, and implementing AI-powered solutions to automate cleaning, standardizing, and optimizing product data at speed and scale.

Thus, AI-powered data enrichment can improve operational efficiency, fuel growth, and enhance brand reputation. According to Arvin Natarajan, GroupBy’s director of products, poor-quality product data plagues nearly every retailer today, impacting every application that relies on data to perform.

“Long-term, insufficient data negatively affects the customer experience and, ultimately, your bottom line,” he said.

Sophisticated generative AI models trained on GroupBy’s proprietary global taxonomy library can identify common data issues and revolutionize product data attribution and management, he offered.

Leveraging AI in Cloud-Based Product Discovery

GroupBy’s e-commerce search and product discovery platform, powered by Google Cloud Vertex AI, offers retailers and wholesalers unique access to Google Cloud’s next-generation search engine. Designed for e-commerce, the platform uses AI and machine learning to process 1.8 trillion events and gather 85 billion new events daily from Google’s entire product suite.

With access to this data, GroupBy delivers digital experiences with a deep understanding of user intent. Natarajan noted that its partnership with Google ensures that customers benefit from any future AI innovations Google develops.

Incomplete, inaccurate, and inconsistent product data can hinder search and discovery, leading to lost revenue and reduced customer loyalty. Natarajan highlighted the importance of AI in data enrichment, citing a 20% increase in e-commerce sales after optimizing product catalog data for search and discovery.

Exposing Revenue Loss From Faulty Data

Technology, or not using it correctly, can make it difficult for retailers to recognize the existence of bad data. Recounting an example from his earlier days working at eBay, Rethink’s E-commerce Strategist Vinny O’Brien presented an example of how faulty indexing caused an ongoing loss of revenue from suddenly invisible product listings.

It took working with a partner to uncover that eBay failed to normalize any product data. So, if someone searched for a Nike shoe, for instance, but the product data lacked a capital N in the formatting when the product was uploaded, that product disappeared after the first phase of the search.

That failure was not limited to just this one product entry. It was a systemically recurring result for other retailers on the platform.

“So you just disappeared. You lost about 30% of your search volume. When we eventually fixed the problem, which was not an easy job at a company of that size, we were recovering revenue at a rate of about 20% to 25% for organizations, particularly ones that had large catalogs, because we got a lot of long, long tail search and so on. But it is a significantly impactful area,” he detailed.

Challenges of Addressing Bad Data in Isolation

According to Joyce Mueller, director of retail solutions at Sada, the bad data problem is more of an unexpected consequence than a deliberate effort to deprioritize product data. It has always been a long-standing problem.

Bad data results from incomplete, inaccurate, or missing fields. Perhaps the wrong data specifications are supplied, or inconsistency is at play across SKUs, she suggested. Lacking clean data pipelines to bring it all together, we end up with data that is not necessarily as complete as we would like it to be, Mueller continued.

“Mostly, this has been a problem for back-end systems. But now, having product data that isn’t complete, accurate, well described, or in a good style and character actually causes problems for digital shoppers. It makes your product less discoverable,” she warned.

The Elusive Goal of Standardizing Data

Applying a one-size-fits-all standards method is a losing battle. Earlier efforts failed to achieve universal success.

O’Brien noted that around 2010, all the major e-commerce retail platforms pushed marketers to comply with a standard data set for every product to make them visible. Adopting that premise was only a good strategy up to a point.

“I think managing the scale of data is the challenge when you have those large companies make that kind of mandate,” he offered. “It needs to be accepted by everybody, and everybody has to conform.”

The scale of that management plus data governance is huge, he added. Various industries come into play, whether it is business-to-business or business-to-consumer. Within these verticals, might be food-grade applications or medical-type products, he said considering other complications in compliance.

“Different types of industries also have nuances of their own. Managing all of that at scale is tremendously difficult,” O’Brien argued.

Bridging the Data Management Gap

Natarajan added that when talking to retailers or distributors at conferences, he sees a gap between manufacturers and retailers. In the end, it is a hole that retailers must also manage, so a lot of nuance has to be navigated.

“There are a lot of challenges to manage this type of data at scale, which I think is probably the reason why we have not seen a level of standardization in product data extended to all the different industries, all the different verticals, and retailers of every size,” he reasoned.

Sada’s Mueller said she wasn’t aware of any retail sub-vertical handling it well. But she sees digital natives handling it better simply because it is new.

“When we think of traditional retailers, they have long-standing systems that do not necessarily talk to each other. It is harder for someone more of an incumbent to fix these sorts of problems and to form and fashion themselves in a way that adopts the new technology. They have a bigger legacy with more technical debt,” she observed.

Some industries may have a better chance of managing their data because the products are less complex. According to Natarajan, you would have less product attribution in some of these categories than you would have in maybe more technically complex products, like machines and engines and things like that.

“You have this difference in types of products that will lead to better data governance, just because it is easier to manage some of these less complex products,” he said.

AI Solutions for Data Enrichment

The panel of experts discussed steps distributors and retailers can take to become more mindful of actions they can take to help overcome the bad data problem.

  • Conduct an audit of product data, starting with the most critical categories.
  • Implement AI-powered data enrichment and cleaning solutions to improve product data quality.
  • Measure the impact of data quality improvements on metrics like revenue, customer satisfaction, and returns.
  • Establish a data governance process to ensure consistent and accurate product data going forward.
  • Explore free trials of AI-powered data enrichment tools to assess the impact on the product catalog.
  • Identify a champion within the organization, potentially from the product merchandising team, to drive the data enrichment initiative.
  • Modernize data pipelines and consolidate product data into a centralized, cloud-based system to enable more advanced analytics and automation.
Jack M. Germain

Jack M. Germain has been an ECT News Network reporter since 2003. His main areas of focus are enterprise IT, Linux and open-source technologies. He is an esteemed reviewer of Linux distros and other open-source software. In addition, Jack extensively covers business technology and privacy issues, as well as developments in e-commerce and consumer electronics. Email Jack.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

More by Jack M. Germain
More in Enterprise IT

E-Commerce Times Channels