This post first appeared in the Constellation Insight newsletter, which features bespoke content weekly.
Generative AI workloads have been dominated by Nvidia, a massive cloud buildout and compute that comes at a premium. I'm willing to bet that in a year, we'll be talking about distributed compute for model training and more workloads on edge devices ranging from servers to PCs to even smartphones.
On earnings conference calls, generative AI is still a common theme, but there's a subtle shift focusing on commoditization of the compute behind large language model (LLM) training and a hybrid approach that leverages devices that are being built with processors generative AI capable.
"There's a market shift towards local inferencing. It's a nod to both the necessity of data privacy and an answer to cloud-based inference cost," said Intel CEO Pat Gelsinger on the company's third quarter earnings conference call.
Here's a quick tour of what's bubbling up for local compute powered generative AI.
- Samsung VP of Mobile Experience Daniel Araujo, speaking on Samsung's earnings call, said "smartphones are poised to be some of the most important access points for AI." He added that "we want to establish a new standard for experiences that mobile devices can provide through hybrid AI, which encompasses both on device and server-based AI solutions."
- Google's Pixel 8 Pro was touted as being capable of running the company's LLM on device.
- Akamai is building an infrastructure-as-a-service network that revolves around edge computing nodes.
- Apple's latest MacBook Pro launch, which includes new M3 processors, touted faster model training as a use case. PC makers are already prepping PCs with higher specs (and prices) to manage local generative AI workloads.
- Amazon's Echo device launch was more about LLMs than devices, but the hardware can handle processing locally in many cases.
- Simply put, you won't be hurting for local compute resources for generative AI use cases.
Amazon CEO Andy Jassy said:
"In these early days of generative AI, companies are still learning which models they want to use, which models they use for what purposes and which model sizes they should use to get the latency and cost characteristics they desire. In our opinion, the only certainty is that there will continue to be a high rate of change."
Indeed, the change coming for generative AI is going to revolve around local compute that's distributed.
Here's why I think we may get to distributed model training sooner than the industry currently thinks:
- Enterprises are building out generative AI infrastructure that often revolves around Nvidia, who needs competition but right now has an open field and the margins to prove it.
- The generative AI price tag is tolerated today because the low-hanging productivity gains are still being harvested. If you can improve software development productivity by 50% who is going to sweat the compute costs? Pick your use case and the returns are there at the moment.
- But those easy returns are likely to disappear in the next 12 months. There will be more returns on investment, but compute costs will begin to matter.
- Companies will also gravitate to smaller models designed for specific use cases. These models, by the way, will need less compute.
- Good enough processors and accelerators will be used to train large language models (LLMs)--especially for cases where a fast turnaround isn't required. Expect AWS' Inferentia and Trainium to garner workloads as well as AMD GPUs. Intel, which is looking to cover the spectrum of AI use cases, can even benefit.
- The good enough model training approach is likely to extend to leveraging edge devices for compute. For privacy and lower costs, smartphones, PCs and other edge devices are going to be equipped and ready to leverage local compute.
Ultimately, I wouldn't be surprised if we get to a peer-to-peer or Hadoop/MapReduce-ish approach to generative AI compute.