Posted by : Ft Jitendra
Friday, 21 August 2015

A data-processing solution from Mesosphere leverages Spark, Kafka, and Cassandra -- but eschews Hadoop -- for enterprise level real-time big-data needs
Mention big-data tools like Spark and Kafka to most
enterprise users, and the other big-data tool that comes to mind along
with them is Hadoop. But does it need to?
Mesosphere,
corporate backers of the Apache Mesos cluster-management project, are
ginning up a big-data stack that eschews Hadoop, but embraces Spark (and
Kafka, and Cassandra, and the Akka event framework) for real-time
processing.
Mesosphere Infinity
is "a turnkey, full-stack offering optimized for big data and IoT," and
its main aim is to provide an easily erected stack for businesses for
real-time data work. It also stands as a recent example of how many of
the technologies reflexively associated with the Hadoop stack don't
require Hadoop to be useful.
Look, ma, no Hadoop
Matt
Trifiro, chief marketing officer for Mesosphere, explained in a phone
conversation how Infinity is managed by another Mesosphere creation: Mesosphere DCOS,
which allows entire data centers full of applications to be stood up
easily. Infinity, in turn, is for managing a relatively narrow range of
applications: Spark for data processing; Kafka for real-time data
ingestion; and another Apache Foundation project, Cassandra, for data
storage.
While
Infinity "doesn't exclude Hadoop," said Trifiro, "it doesn't require
it, either. You can use [Hadoop's] HDFS as a persistent data store, and
you may have Hadoop processing over data pushed into Cassandra, but in
terms of real-time acquisition, you need a specialized stack."
Sparks of inspiration
Spark has drawn attention
as of late from a roster of A-list technology firms interested in both
investing in the project and leveraging it for heavy-duty business
analytics work. Still, like many other open source data tools, Spark is
by itself far more "project" than "product" -- it isn't a trivial effort
to use in an enterprise environment.
Trifiro
claims Spark and the rest of the Infinity stack "was built from
observation of what people were putting into production." Businesses
were attempting to put together Spark and Kafka stacks for real-time
analysis, said Trifiro, because "the demand for processing real-time
data by non-Web companies is relatively new, and there's immense
pressure on IT teams to do this." Standing up an entire such stack has
"historically required a lot of expertise," and Infinity is meant to
require minimal work to get up and running.
Mesosphere
plans to make Infinity's stack even easier to consume by offering it via
existing cloud services. Right now, though, the only named partner for
cloud-based enterprise distribution is Cisco, the same company that
worked hand-in-hand with Mesosphere to build Infinity.
One
possible analogy is with running applications in containers, versus
using virtualization and OpenStack. Containers offer a potentially more
precise solution to the problems of running applications at scale than
VMs did. Likewise, Spark alone, as opposed to Spark plus Hadoop, might
present a better fit for the data-processing problems faced by
enterprises -- as long as deployment and management of a Spark stack
doesn't put them back at square one.
Related Posts :
- Back to Home »
- Tech-reviews »
- Mesosphere's new -big data solution: -Add Spark, hold the -Hadoop