With AI agents getting all of the attention, it's worth noting that there are multiple companies leveraging generative AI and off-the-shelf models powered by first party data.
Constellation Research CEO Ray Wang has noted that we're entering an age of data scarcity. All of the data has been used up to train foundational models, and now enterprises will begin to realize they don't have enough data to make AI work for them.
Nevertheless, enterprises are taking their own secret sauce--first party data--and combining it with AI models. The early results are promising, but you're also going to see mergers that are driven in part by first party data. Rocket's acquisition of Redfin is more about data and an integrated AI experience as much as it is synergies.
For what it's worth ($1.75 billion), Rocket paid $437.5 million per petabyte of data for Redfin. The 14 petabytes of data with the combined models will certainly drive model accuracy.
Even if enterprises aren't that far along on the AI journey, they're still setting themselves up with solid data strategies. Without a data strategy, there isn't an AI strategy.
Here are some vignettes highlighting how enterprises are getting serious about the data flywheel.
Pinterest: 'If you have unique signal the AI gets better and better'
Pinterest about 18 months ago moved to a GPU-based stack for its advertising system. It has seen a 10% lift in the relevancy in its recommendations and has seen further gains.
CEO Bill Ready said the company has increased its context window by 30-fold and now it gets more data from signals across the Pinterest network, which essentially serves as a window shopping platform for Gen Z. "The clicks that are happening on our platform, increasing that 30-fold gives us even better insight into users' tastes and preferences and that led to a 250 basis point plus lift in saves, 150 basis point plus lifting clicks,” explained Ready.
For Ready, Pinterest's success is largely due to its unique data set and signals it provides:
"We are doing things that are unique to our platform around computer vision, but we're also leveraging things off the shelf. And we find that even if we take large language models off the shelf and train them against our unique signal we're able to see several hundred basis points lift beyond what those models could have done on their own, which just gets to the value of our unique signal.
The real question is who has unique feedback loops in their business to go train the AI to do something unique."
For Pinterest, unique is about developing a "taste graph" that provides signals about trends, and an experience that's "an oasis away from the toxicity experienced elsewhere."
Pinterest has leveraged AI automation for its advertiser platform and refine the customer journey. The big takeaway from Pinterest's data set and AI training is that the job is never done. "The things we've done aren't one and done. They have a compounding effect on the AI as we have that unique curation behavior and unique signal where we understand people's preferences," said Ready. "The AI then helps us make better and better recommendations. That brings more shopping behavior. And the AI will continue to get better. But as the AI continues to get better, our unique signal will make us more and more differentiated."
In the fourth quarter, Pinterest saw global monthly active users hit an all-time high of 553 million. Pinterest reported 2024 net income of $1.86 billion on revenue of $3.65 billion, up 19% from a year ago.
John Wiley & Sons: Learning content to train models
John Wiley & Sons is best known for its business publishing, but its repository of content is being used to train targeted large language models.
Matt Kissner, President and CEO, said on John Wiley & Sons' third quarter earnings call, that its data and content is key for training industry-specific models.
The company is expanding its licensing efforts for AI. "Our content serves as a foundation for training large-language models and bringing to market vertical-specific LLMs," said Kissner.
John Wiley & Sons said it is using backlisted research content on published material older than three years to train model. "As we take on new AI-specific initiatives, our guiding principles remain straightforward. We recognize our responsibility to engage with AI developers to secure scientific accuracy and deliver optimal Learning outcomes," said Kissner. "These models require training on trusted, authoritative content, such as Wiley’s, while protecting the rights of authors and other copyright holders, a fundamental responsibility we embrace as a knowledge company."
Kissner added that R&D-intensive corporations are using Wiley's AI-powered content to speed up product development, identify breakthroughs and accelerate internal cycle times. "We are an early beneficiary in AI development, evolving alongside our corporate partners. We continue to explore various content opportunities for training, inference and application with an encouraging pipeline," said Kissner.
Rocket buys Redfin to create data, AI flywheel to drive CX
Rocket Companies is buying Redfin in a move that can reinvent the real estate purchase funnel by bulking up the first party data used to train AI models.
Under the all-stock deal, Rocket will acquire Redfin for $12.50 a share, or $1.75 billion. The plan is that Rocket will benefit from Redfin's nearly 50 million monthly visitors. Those leads at the top of the funnel can feed Rocket's mortgage business as well as other consumer services.
On the surface, Rocket and Redfin can combine forces and lower costs of real estate purchases. But you may want to look at the Rocket-Redfin deal through a data aggregation lens.
We've previously chronicled Rocket's master plan and AI efforts. The company is built on first-party data and ongoing purchase signals that feed its AI models.
- Rocket Companies’ genAI strategy: Playing both the short and the long game
- Rocket Companies’ strategy: Generative AI transformation in turbulent market
- Enterprises leading with AI plan next genAI, agentic AI phases
To look at Rocket's acquisition of Redfin in a unique way consider that the company is paying $437.5 million per petabyte of consumer data. In a statement, Rocket CEO Varun Krishna said the purchase of Redfin is about creating a unified search and financing process in real estate.
Krishna said: "The companies with the most data will win, and no industry is safe from the disruption or the opportunity that AI creates," said Krishna. "As commoditization and disintermediation accelerate, access to scaled proprietary data is what separates industry leaders from the rest."
Reckitt Benckiser Group: Years of research and testing data now useful
Reckitt Benckiser Group, a consumer product goods company, is leveraging generative AI to sift through years of research and testing data. "We're using our new tools to sift through years of past research and testing data, which is resulting in new product concept that we're assessing, each grounded in science and consumer insights to create great products for the future," said CEO Kris Licht.
The company, which is a collection of brands including Durex, Lysol and Finish, is also using Google API data to map affluence and new markets for distribution.
First party data is also being leveraged to improve execution and create a feedback loop. Licht noted how data and technology is being used in self-care products. "We're particularly well placed to meet the growing demand for self-care as health care systems come under increasing pressure. We're using data and technology to further improve our in-market execution. In the past 12 months, our R&D teams have been using proprietary Gen AI tools that take real-time consumer feedback into account country-by-country to better understand the success of our new product launches and apply these learnings to other markets," said Licht.
Dick's Sporting Goods: Youth sports database
Dick's Sporting Goods has transformed into an omnichannel retailer, redesigned stores to focus on experiences by sports and leveraged its loyalty program to target consumers.
Now it is looking to creating a retail media network.
CEO Lauren Hobart said Dick’s Media Network is leveraging the company's Scorecard loyalty program and database, which is the best data set is youth sports.
"While it’s in the early stages, we are very pleased with initial interest in the platform and believe Dick’s Media Network will become a driver of long term sales growth and an important driver of long term gross margin expansion as we scale and optimize the network," said Hobart.
The company's fourth quarter results were strong, but the outlook was mixed. Hobart said the outlook reflects "much uncertainty in the world today," but Dick's is "not seeing a weaker consumer now," said Hobart.
"Our consumer has proven that in times of stress and uncertainty, that they are leaning into outdoors, being outside, going for a run or a walk, going to watch team sports," she said. "It’s become much more of a necessity than a discretionary item, and it makes sense because it is a way for people to find calm in an otherwise uncertain time frame."
Dick's also has an effort called GameChanger that is a platform to engage with athletes. The platform has an average of 1.8 million daily active users. In 2025, GameChanger is expected to have revenue of about $150 million.
For now, Dick's technology efforts are focused on its app, boosting efficiency for online fulfillment and marketing to its customer base. Over time, these touch points are likely to fuel AI efforts.