Success Depends on Adequate Governance, Cataloging and Hands-On Access to Data in Hadoop
This report explores key trends in the popularization of Hadoop and strategies being employed to manage data lakes and make them more accessible. Companies have embraced the concept of the data lake or data hub that spans analytical and data-driven application needs. But gaps remain in the maturity and capability of the Hadoop stack, leaving organizations to struggle with how to ingest, cleanse, transform, discover, enrich, blend, catalog and govern data in data lakes.
If the data lake concept is to succeed, Constellation Research believes organizations need three key capabilities:
1. Data management and governance
2. Data cataloging and metadata management
3. Self-service discovery and data preparation
This report examines these three capabilities and the mix of tools available from Hadoop distributions and from next-generation and incumbent data management vendors. In particular, it looks at categories of tools and suites that not only abstract the complexities of Hadoop but also open up access to enterprise IT and business professionals. These users expect unified, enterprise-grade interfaces, self- service data discovery and prep capabilities, and prebuilt integrations to popular business intelligence and analytics tools.