Standing on the shoulders of the lakehouse trend, AWS takes data platform and data workload unification a step further than its rivals.

AWS Re:Invent saw plenty of data- and database-related announcements, but the obvious highlight was the announcement of the new Amazon  SageMaker. The announcement was delivered as the climax of CEO Matt Garman’s opening keynote. It was highlighted yet again by Swami Sivasubramanian, AWS VP of AI and Data, both at the beginning of his keynote and then again during an extended demo presentation. Finally, Garman mentioned the new SageMaker yet again, early and prominently, during an extended question and answer session with analysts.

SageMaker is an old product name at AWS, but the new SageMaker is an integrated studio experience unifying all data and all data workloads on one platform. AWS executives say the new Amazon SageMaker experience will meet customers where they are, meaning they’ll keep using the services they’re used to -- RedShift, EMR, Glue, Athena, Bedrock, MSK, QuickSight, OpenSearch, and more -- and they won’t have to rip and replace anything. The ML and AI model-development environment formerly known as SageMaker has been renamed SageMaker AI, which becomes another of the component services accessible through the new, unified Amazon Sagemaker.

Users can keep accessing all of the services they’re used to directly, AWS executives stressed, but the new Amazon SageMaker delivers a unifying studio “experience” through which all data constituents will be able to access what are said to be improved, streamlined and better-integrated versions of the SQL editors, notebooks and other tools and user interfaces that they’re used to. What’s more, all data and related assets will be accessible through new Git-based project repositories. Data discovery, collaboration and governance is supported by the new SageMaker Catalog, which is an evolution of Amazon Data Zone.

The new center of data management for Amazon SageMaker is SageMaker Lakehouse, an Iceberg-compliant store built on Amazon S3. SageMaker Lakehouses will be fueled by “Zero-ETL” integrations with Amazon Aurora, Amazon RDS for MySQL, Amazon DynamoDB, and Amazon RedShift, among other sources sure to follow.

By providing a unified studio environment and unified access to data, Amazon SageMaker will foster collaboration among project participants, whether they’re engineers, analysts or scientists. Each Amazon SageMaker UI, whether it’s a SQL editor, notebook or catalog, will offer a GenAI-based Amazon Q assistant to answer natural language questions, generate code, recommend data and deliver insights.

The new Amazon SageMaker is NOT one big all-or-nothing bundle that you’ll have to buy. The costs of using the newly unified platform will simply be the passthrough costs of using the individual constituent services that organizations are already using, where it’s RedShift, EMR, Glue, SageMaker AI, Bedrock, Kinesis, QuickSight, or whatever. There will be no changes to the current pricing schemes of these services, executives confirmed. The only really new cost is the new studio and project experience, which will consume a small amount of compute, said to be akin to the compute cost of the old SageMaker Studio experience.

Constellation’s Analysis

I’m very impressed by the Amazon SageMaker announcement. It brings together the vendor’s core data management, analytics and AI capabilities without forcing changes or presenting any big asks or cost increases to customers. It takes the data cataloging, collaboration and governance capabilities that AWS was already developing (with Amazon Data Zone) and seamlessly weaves them into the experience through SageMaker Catalog and project workspaces. The customer’s existing RedShift warehouses and lakes can all be exposed through SageMaker Lakehouse through a simple registration process, which ups AWS’s Iceberg standard compliance game.

Competitively speaking, Amazon SageMaker stands on the shoulders of Databricks' pioneering, single-platform lakehouse vision and it goes further than does Microsoft Fabric or Google Cloud’s Big Query/Vertex AI/Dataplex combination to consolidate and unify all data and all data workloads in one environment. There are still many moving parts and complex choices and tradeoffs to be considered -- like choosing S3 standard storage versus Iceberg-standard S3 Table Buckets versus Redshift Managed Storage under the hood. What’s more, many services are already integrated into Amazon SageMaker, but the studio experience itself and some integrations, like those to streaming services, QuickSight and OpenSearch, won't see general availability until 2025 (though the studio experience is expected in early Q1). Nonetheless, it’s a solid vision that truly does meet customers where they are while promoting collaboration and higher-level uses of data, like AI and GenAI.

More: