Databricks infused its data platform with generative AI capabilities across its Lakehouse Platform with tools to enable customers to leverage large language models (LLMs), federate and govern data and knowledge engines that learn corporate cultures.
The company outlined the news at its Data + AI Summit. Databricks' news follows announcements from Snowflake and MongoDB designed to land more workloads and enable customers to leverage generative AI. Leading up to Databricks' conference, the company announced the acquisition of MosaicML and launch of Lakehouse Apps. The big takeaway is that data platforms and generative AI capabilities are converging.
Constellation Research analyst Doug Henschen summed up the data platform game: "All three (Snowflake, Databricks and MongoDB) want customers to do as much as possible on their platforms so they are invading each other’s turf. But their original (and still predominant) dance partners are data warehouse for Snowflake, data science for Databricks and developers for MongoDB."
Here's a roundup of Databricks' enhancements to its platform.
- The company said it will make it easier to deploy and manage LLMs with Lakehouse AI additions to offer monitoring and governance for LLM development. Databricks is adding Vector Search, a collection of opensource models, LLM-optimized Model Servicing, MLflow 2.5 with LLM tools and Lakehouse monitoring.
- Lakehouse AI will unify the AI lifecycle from data collection and preparation to model development. Databricks Vector Search will manage and automatically create vectors in Unity Catalog. Databricks AutoML will feature a low-code approach to fine tuning LLMs. And Databricks will curate a list of open-source models in its marketplace.
- Databricks outlined MLflow 2.5, a new release of the Linux Foundation open-source project MLflow. Updates include MLflow AI Gateway, which allows developers to swap out backend models and switch between LLM providers, and MLflow Prompt Tools, a no-code set of visual tools.
- Databricks Lakehouse Monitoring will monitor and manage data and AI assets within Lakehouse.
- LakehouseIQ adds a natural language interface to the Lakehouse Platform. LakehouseIQ uses generative AI to understand company specific jargon, data usage and organizational structure to answer questions within the context of a business. The goal is to democratize data analytics across a corporation. According to Databricks, LakehouseIQ will learn from signals embedded in corporate data including schemas, documents, queries, popularity, lineage, notebooks and dashboards.
- Lakehouse Federation in Unity Catalog will include query federation across data assets and platforms outside of Databricks. Databricks is also offering governance outside of its platform via Unity Catalog.
- Delta Lake 3.0, the latest contribution to Linux Foundation's Delta Lake project, will add Universal Format (UniForm), which will allow data stored in Delta to be read as if it were Apache Iceberg or Apache Hudl.
Doug Henschen's take:
- Databricks is, first and foremost, a platform for data scientists and it’s used by many of its 10,000+ customers as a platform for a significant chunk, if not a majority, of their data. Databricks is doing everything it can do to enable those customers to innovate with their data using AI, ML, and analytics, and it’s doing a great job of it.
- Databricks has spend the last three years building up the warehouse side of its Lakehouse platform, but this year the generative AI tsunami has rightly refocused Databricks on what has always been its greatest strength: data science including ML and AI. It’s very clear to me that Databricks customers are building AI models with Databricks today and there’s a deep well of capabilities that are generally available and now-emerging capabilities that are just becoming GA or are in public preview and very close to becoming GA.
- What was very clear during today’s Databricks keynote is how far along Databricks customers such as JPMorgan Chase, Jet Blue Airways and Rivian are in building innovative ML, AI and even generative AI capabilities using Databricks. A bunch of new enablers were announced today, several of which are already in public preview and are expected to go GA this year.