AWS SageMaker Evolves into Unified Data and AI Hub

December 4, 2024
AWS SageMaker Evolves into Unified Data and AI Hub

Stay ahead of the curve with our daily and weekly newsletters. Get the latest updates and exclusive content on cutting-edge AI developments. Learn More


At the annual re:Invent 2024 conference, Amazon Web Services (AWS) unveiled the next generation of its cloud-based machine learning (ML) development platform, SageMaker. The platform has been transformed into a unified hub, enabling enterprises to consolidate all their data assets from various data lakes and sources. It also integrates a comprehensive set of AWS ecosystem analytics and formerly separate ML tools.

Put simply, SageMaker is no longer just a platform for building AI and machine learning apps. Now, it also allows you to link your data and extract analytics from it.

This strategic move is a response to the growing trend of analytics and AI convergence, where enterprises are increasingly using their data in interconnected ways. This ranges from powering historical analytics to enabling ML model training and generative AI applications for various use cases.

Microsoft has been particularly proactive in integrating all of its data offerings within its Fabric product. Just last month, it announced that more of its operational databases would be integrated natively. This integration facilitates easier AI app development for customers, as native access to data can significantly speed up AI and enhance its efficiency. Microsoft has been leading the way in this area, and now Amazon is hot on its heels.

“Many of our customers already use combinations of our purpose-built analytics and ML tools, such as Amazon SageMaker, Amazon EMR, Amazon Redshift, Amazon S3 data lakes and AWS Glue. The next generation of SageMaker integrates these capabilities, along with some exciting new features, to provide customers with all the tools they need for data processing, SQL analytics, ML model development and training, and generative AI, all within SageMaker,” said Swami Sivasubramanian, the vice president of Data and AI at AWS.

The Heart of the Matter: SageMaker Unified Studio and Lakehouse

Amazon SageMaker has always been a vital tool for developers and data scientists, offering a fully managed service to deploy production-grade ML models.

The platform’s integrated development environment, SageMaker Studio, provides teams with a single, web-based visual interface to carry out all machine learning development steps, from data preparation and model building to training, tuning, and deployment. 

However, as enterprise needs continue to evolve, AWS recognized that limiting SageMaker to just ML deployment was not sufficient. Enterprises also require purpose-built analytics services (supporting workloads like SQL analytics, search analytics, big data processing, and streaming analytics) in conjunction with existing SageMaker ML capabilities. They also need easy access to all their data to drive insights and create new experiences for their downstream users.

Introducing Two New Capabilities: SageMaker Lakehouse and Unified Studio

To address this need, AWS has now enhanced SageMaker with two key capabilities: Amazon SageMaker Lakehouse and Unified Studio.

The lakehouse offering provides unified access to all the data stored in the data lakes built on top of Amazon Simple Storage Service (S3), Redshift data warehouses and other federated data sources. This breaks down data silos and makes it easily queryable, regardless of where the information is originally stored.

“Today, more than one million data lakes are built on Amazon Simple Storage Service… allowing customers to centralize their data assets and derive value with AWS analytics, AI, and ML tools… Customers may have data spread across multiple data lakes, as well as a data warehouse, and would benefit from a simple way to unify all of this data,” the company noted in a press release.

Once all the data is unified with the lakehouse offering, enterprises can access it and put it to work with the other key capability — SageMaker Unified Studio. 

At its core, the studio acts as a unified environment that brings together all existing AI and analytics capabilities from Amazon’s standalone studios, query editors, and visual tools – spanning Amazon Bedrock, Amazon EMR, Amazon Redshift, AWS Glue and the existing SageMaker Studio.

This eliminates the time-consuming hassle of using separate tools in isolation and gives users one place to leverage these capabilities to discover and prepare their data, author queries or code, process the data and build ML models. They can even pull up Amazon Q Developer assistant and ask it to handle tasks like data integration, discovery, coding or SQL generation — all within the same environment.

In essence, users get a single platform with all their data and all their analytics and ML tools to power downstream applications. This ranges from data engineering, SQL analytics and ad-hoc querying to data science, ML and generative AI.

Integrating Bedrock in Sagemaker

For example, with Bedrock capabilities in the SageMaker Studio, users can connect their preferred high-performing foundation models and tools like Agents, Guardrails and Knowledge Bases with their lakehouse data assets. This allows them to quickly build and deploy generative AI applications.  

Once the projects are executed, the lakehouse and studio offerings also enable teams to publish and share their data, models, applications and other artifacts with their team members – while maintaining consistent access policies using a single permission model with granitary security controls. This accelerates the discoverability and reuse of resources, preventing duplication of efforts. 

Compatibility with Open Standards

Importantly, SageMaker Lakehouse is compatible with Apache Iceberg, meaning it will also work with familiar AI and ML tools and query engines compatible with the Apache Iceberg open standard. Plus, it includes zero-ETL integrations for Amazon Aurora MySQL and PostgreSQL, Amazon RDS for MySQL, Amazon DynamoDB with Amazon Redshift as well as SaaS applications like Zendesk and SAP.

“SageMaker offerings underscore AWS’ strategy of exposing its advanced, comprehensive capabilities in a governed and unified way, so it is quick to build, test and consume ML and AI workloads. AWS pioneered the term Zero-ETL, and it has now become a standard in the industry. It is exciting to see that Zero-ETL has gone beyond databases and into apps. With governance control and support for both structured and unstructured data, data scientists can now easily build ML applications,” industry analyst Sanjeev Mohan told VentureBeat.

The New SageMaker is Now Available

The new SageMaker is available for AWS customers starting today. However, the Unified Studio is still in the preview phase. AWS has not shared a specific timeline but noted that it expects the studio to become generally available soon. 

Companies like Roche and Natwast Group will be among the first users of the new capabilities, with the latter anticipating Unified Studio will result in a 50% reduction in the time required for its data users to access analytics and AI capabilities. Roche, meanwhile, expects a 40% reduction in data processing time with SageMaker Lakehouse.

AWS re:Invent runs from December 2 to 6, 2024.

rnrn
Avatar photo

Jared Cohen

Jared studied Psychology at UCLA, focusing on the effects of fandom culture on mental health. His intriguing takes on fandom psychology and his reviews on self-help books designed for geeks make him a unique contributor to Hypernova.

Most Read

Categories

Creature Commandos Launches DCU with Impressive Rotten Tomatoes and Metacritic Ratings
Previous Story

Creature Commandos Launches DCU with Impressive Rotten Tomatoes and Metacritic Ratings

Exploring the Possibility of Andy Serkis Reprising Knull Role in MCU
Next Story

Exploring the Possibility of Andy Serkis Reprising Knull Role in MCU