One of the more important big data and data science conferences of early 2016, Spark Summit East, is underway this week in New York. The event is devoted to Spark, the open source big data processing engine, which rode a rising wave of hype and support from vendors during 2015.
Constellation Research VP and principal analyst Doug Henschen is in attendance and will be publishing his on-the-scene coverage later this week. In advance of the event, Doug provided some insights on what he's expecting from the conference.
CR Insights: What's the big news expected at the event?
Henschen: We're sure to hear more about the recent #ApacheSpark 1.6 release, with highlights including faster streaming, automatic memory management, improved Spark SQL performance and extended machine learning capabilities. We’ve seen huge uptake and support for Spark over the last year, with many commercial software vendors pledging support for and/or integration with Spark in some way.
Data-integration vendors, for example, are taking advantage of the fast, in-memory processing capabilities of the Spark Core to speed data-processing and data-transformation work. Other commercial vendors are tapping Spark Streaming for stream processing or Spark SQL for fast, ad hoc analysis against big data sets in Hadoop.
CR Insights: What are the most important Spark topics set for discussion?
Henschen: I’m most looking forward to hearing more about Spark Streaming and improvements in the machine learning library of Spark. To date, Spark Core and Spark SQL seem to be the most mature and embraced aspects of the Spark ecosystem. Critics—mostly competitors of Spark Streaming—like to pigeonhole it as a microbatch system that won’t stand up to the low-latency demands of streaming scenarios. Spark advocates say performance differences with rival streaming options, such as Apache Flink and Apache Apex, are minimal.
I’m looking forward to hearing more about Spark Streaming performance from early adopters such as Shopify and The Weather Company and from community members such as Confluent. I’ll take real-world evidence of success over theoretical discussions any day.
CR Insights: How should customers view the current market landscape for Spark?
Henschen: As for all those commercial vendors touting Spark support and integration, rest assured that they’re seeking to couch their own offerings as additive or complementary to the benefits of Spark. There’s nothing wrong with commercial vendors seeking to offer better tools and more value on top of Spark, but make sure you clearly understand the commercial-support options and the depth and breadth of the development community behind any product or service that is not a part of the Spark ecosystem.
The first question you have to ask when considering any technology is, what value will it bring to my organization? But my next questions would be, what organizations and how many organizations are using it, who is contributing to the advancement of the technology and what are my support options? The bigger the community, the better.
Reprints
Reprints can be purchased through Constellation Research, Inc. To request official reprints in PDF format, please contact Sales.