Skip to main content

Posts

Showing posts from May, 2014

Ad-hoc analysis over Cassandra data with Facebook Presto

A few days ago I attended in Moscow Cassandra meet up with my presentation, from one of the participant, I heard about Facebook project presto for fast data analysis. I was very curious and hurry up to hands on it. From Presto Site "Presto is a distributed SQL query engine optimized for ad-hoc analysis at interactive speed. It supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions". Historically Cassandra was lack of interactive Ad-hoc query, even it's doesn't support any aggregate function in CQL. For this reason, whenever we proposed our customers to utilize Cassandra as a database, they were always confused. However, for analysis data over Cassandra we have the following frameworks: 1) Hadoop Map Reduce 2) Spark and Shark Also a few commercial projects like impala. But Hadoop Map Reduce is definitely slow to use as Ad-Hoc queries. Spark is very fast with its RDD data models, but it also needs a few exercises to run q