Executive Summary
This report examines Databricks’ progress toward delivering a “lakehouse” platform that can serve both the data engineering and data science needs associated with data lakes as well as the business intelligence (BI) and analytics requirements associated with data warehouses. Databricks’ capabilities as a multicloud data lake platform are well known, so this report focuses mainly on the data warehousing capabilities and improvements detailed at the company’s 2022 Data+AI Summit event held in San Francisco June 27–30. The report also details the importance of handling and analyzing streaming data, which Databricks has addressed with Project Lightspeed to make Spark Structured Streaming faster and simpler. Finally, the report details the expanding use of the Databricks lakehouse for extract, transform, load (ETL) and data warehousing needs by two customers: cryptocurrency financial services firm Coinbase and pharmaceutical giant GSK.
This report provides an overview and analysis of Databricks’ progress toward delivering a single platform capable of serving the breadth of data engineering, data science, BI/analytics, and streaming needs. C-suite leaders, information technology (IT) executives, and line-of-business executives addressing analytical and next-generation application needs should use this report to gain a better understanding of how Databricks is delivering a single platform capable of serving these broad requirements without the need for data movement and replication among multiple platforms.