-
Notifications
You must be signed in to change notification settings - Fork 4
Product Overview
SNAP is a product intended for slice and dice analysis /OLAP style analysis on very large datasets directly on datalakes. It can be used as a replacement for traditional OLAP cubes to overcome issues with scale and performance SNAP can be an alternative to tools like Hive, Impala as well as Analytic databases like Greenplum, Vertica , Redshift etc when these tools are used as a datamart on BI for interactive queries. SNAP can be used as an alternative to building extracts ( Tableau extracts, or custom extracts of data for performance) SNAP can be used when you want fast query performance without pre-aggregating the data. SNAP is Spark native. SNAP can be used for any Spark analysis workloads including machine learning. SNAP works well with Notebook style data analysis using Jupyter, nteract and more combining SQL with Python.
SNAP is not an OLTP database and is not intended for transactions. It also not intended as a reporting tool when the need is to run batch SQL producing lots of output which is then analyzed offline. SNAP is designed to work with B.I tools like Oracle Analytics Cloud, Tableau etc.
SNAP is used by Fortune 10 companies with extensive datawarehousing needs and multi-table joins as well as by high velocity data use cases such as in Ad-Tech and IOT analytics.
A Qube is a logical abstraction in SNAP
It consists of the following - A join graph represented by means of the Star Schema definition of the datasets involved in the Qube( Facts and Dimensions) - OLAP Index A logical model of dimensions and metrics ( columns from the tables in the star schema)
Physical representation of elements of the logical Qube.
An OLAP index is physically stored as columnar compressed data of the columns in the Qube and indexes on the dimensions of the Qube
SNAP is fast because
It has built-in optimizations - Query planning - Query rewrites - Join optimization - Join elimination - Eager aggregations - Dimension context propagation It uses an in-memory index for fast queries SNAP stores its data in a highly compressed columnar format with indexes on dimensions.
SNAP is built on Apache Spark. It’s different than many products that “Integrate” with Spark. SNAP is embedded within Spark and hence SNAP deployment is Spark Deployment
- Start a Spark cluster on Oracle Big Data Cloud Compute Edition
SNAP is different from Spark even though it shares the run time of Spark.
SNAP uses the concept of in-memory indexes and its own file format and query optimizations. When running SNAP, all queries submitted to SNAP go through the SNAP thrift server and is managed and optimized by the SNAP engine. SNAP is accessed through the SNAP thrift server which is a thin layer on top of the Spark Thrift server. So any B.I tool that can connect to Spark Thriftserver can connect to SNAP