Cloudera is continuing to refine its Hadoop-based big data analytics platform with Cloudera Enterprise 5.5, including a beta version of a tool that analyzes and optimizes Hadoop workloads. Here are the details from the announcement:
Within any business, there can be countless workloads being run at any given time, across multiple systems, that change based on time of day and business need. Across many of these workloads, similar pain points have emerged, like breakdowns in ETL pipelines, long wait times for BI reports, increasing system pressure from ad hoc queries, and unnecessary query complexity -- all resulting in lost time and money.
Cloudera Navigator Optimizer instantly analyzes existing workloads, providing visibility into which ones are the most critical, which data is accessed most, and how is it being used. It then automatically turns this information into a full optimization strategy for fast success with Hadoop.
The 5.5 release also includes a range of other improvements to Cloudera's platform, including support for additional data types along with new features for security and data stewardship. More information is available at this link.
Inside Navigator Optimizer
The most significant component of Cloudera Enterprise 5.5 is clearly Navigator Optimizer, which is based on technology Cloudera gained through the acquisition of Xplain.io earlier this year. It's expected to become generally available sometime next year, but no firm date was immediately available.
Navigator Optimizer targets a very real need for enterprise IT shops, says Constellation Research VP and principal analyst Doug Henschen.
"Lots of companies are trying to move workloads into Hadoop simply because the costs are much lower," he says. "The legacy SQL code out there, much of it created years ago by people who may have left the company, is a real problem that bogs down data warehouse environments and leads to excessive costs and overbuying of warehouse capacity. The point of the optimizer is to analyze the SQL, spot redundancies and provide tools that make it wasier to move workloads over to a Cloudera-based data hub."
Avoiding A Culture Clash
One thing about legacy code is that sometimes, people are more interested in preserving the status quo. "This tool would likely come into play when there's a clear mandate from the CIO and/or a central architecture team to solve data warehouse bottleneck problems, and to do so by shifting workloads onto Hadoop," Henschen says. "Otherwise, DBAs and data-integration professionals probably wouldn't be too keen to help migrate workloads they manage over to a platform that others manage."
Cloudera says Optimizer's tools are understandable to both DBAs and Hadoop administrators.
Alternative Optimization
Cloudera isn't necessarily breaking new ground with Navigator. After all, database, ETL (extract, transform and load), and BI (business intelligence) vendors offer SQL optimization tools of their own. "These could be used to streamline operations on the original platform—the data warehouse—or to migrate certain workloads over to Hadoop," Henschen says.
To that end, traditional database vendors have felt increased pressure to provide more optimization tools as Hadoop grows more popular for certain workloads.
The Bottom Line
"Five years ago, companies had no choice but to expand data warehouse capacity at high cost, as data volumes grew," Henschen says. "Now there are also options to diagnose and clean the old SQL code, then move the right workloads to the right platforms." Data warehouse are still right for mission-critical workloads with tight SLAs, but many other types can be moved to Hadoop-based data hubs," he adds.