Continuing its campaign to evangelize the virtues of the SQL query language while also embracing alternatives, MemSQL, Inc. is rolling out a connector for the popular Apache Spark framework that it says enables rapid and seamless data transfer between the platforms.
The MemSQL Spark Connector combines the in-memory processing and distributed architectures of both MemSQL and Spark for high performance parallel throughput. MemSQL’s special sauce is that it enables both heavy-duty transaction processing and real-time analysis to be performed on unstructured data streams in the same environment, without the hassle of shuffling data back and forth. The company claims that its in-memory architecture provides unparalleled performance while preserving compatibility with the 30-year-old SQL language. The company clearly doesn’t want to be pigeonholed in the SQL niche, however. This announcement comes just a few weeks after MemSQL added open source connectors to external data sources like Hadoop and Amazon S3.
MemSQL executives pointed to similarities between the architecture of their namesake database and the Spark framework, including memory-optimized processing and a distributed architecture. The combination of MemSQL’s in-memory database for data caching and Spark’s data memory-optimized processing structure make the combination “the fastest way to operationalize anything you do with Spark today,” said Eric Frenkiel, CEO of MemSQL. The company also said it’s the simplest way to integrate Spark with an operational – rather than an analytical – database.
The connector is ideal for scenarios in which data stored in a production MemSQL database can be manipulated by Spark analytics and the results saved back to the production data store without a clunky extract/transfer/load (ETL) procedure. For example, a marketer who’s interested in behavior by customers who are at the outer edges of a bell curve, “may build a Spark model to extract the edge, analyze it, put it in MemSQL for persistence and then give it to the data analytics team to better understand outliers,” Frenkiel said.
Customers can also store data in the Hadoop File System and move it to MemSQL for production or Spark for analysis. “There’s never a single solution,” Frenkiel said. “It’s important to support both traditional and new tools.”
The connector also acknowledges the growing dominance of Spark as a flexible analytics engine and likely replacement for the Hadoop’s native MapReduce programming model. As noted by SiliconANGLE last week, some people now believe that Big Data buying decisions will soon be influenced more by the choice of Spark than Hadoop.
MemSQL is covering its bets. “The prevailing wisdom is that MapReduce has a limited life and Spark is widely viewed as the next iteration,” said Gary Orenstein, chief marketing officer of MemSQL.