Product Overview

Overview

Concepts and Use cases

What problems does SNAP solve?

SNAP is a product intended for slice and dice analysis /OLAP style analysis on very large datasets directly on datalakes. It can be used as a replacement for traditional OLAP cubes to overcome issues with scale and performance SNAP can be an alternative to tools like Hive, Impala as well as Analytic databases like Greenplum, Vertica , Redshift etc when these tools are used as a datamart on BI for interactive queries. SNAP can be used as an alternative to building extracts ( Tableau extracts, or custom extracts of data for performance) SNAP can be used when you want fast query performance without pre-aggregating the data. SNAP is Spark native. SNAP can be used for any Spark analysis workloads including machine learning. SNAP works well with Notebook style data analysis using Jupyter, nteract and more combining SQL with Python.

What is not intended for?

SNAP is not an OLTP database and is not intended for transactions. It also not intended as a reporting tool when the need is to run batch SQL producing lots of output which is then analyzed offline. SNAP is designed to work with B.I tools like Oracle Analytics Cloud, Tableau etc.

Who uses SNAP?

SNAP is used by Fortune 10 companies with extensive datawarehousing needs and multi-table joins as well as by high velocity data use cases such as in Ad-Tech and IOT analytics.

Key concepts

Qube ( Logical data model and metadata)

A Qube is a logical abstraction in SNAP

It consists of the following - A join graph represented by means of the Star Schema definition of the datasets involved in the Qube( Facts and Dimensions) - OLAP Index A logical model of dimensions and metrics ( columns from the tables in the star schema)

What is an OLAP Index?

Physical representation of elements of the logical Qube.

An OLAP index is physically stored as columnar compressed data of the columns in the Qube and indexes on the dimensions of the Qube

Why is it fast?

SNAP is fast because

It has built-in optimizations - Query planning - Query rewrites - Join optimization - Join elimination - Eager aggregations - Dimension context propagation It uses an in-memory index for fast queries SNAP stores its data in a highly compressed columnar format with indexes on dimensions.

What does it take to deploy SNAP ?

SNAP is built on Apache Spark. It’s different than many products that “Integrate” with Spark. SNAP is embedded within Spark and hence SNAP deployment is Spark Deployment

- Start a Spark cluster on Oracle Big Data Cloud Compute Edition

How is SNAP different from Spark ?

SNAP is different from Spark even though it shares the run time of Spark.

SNAP uses the concept of in-memory indexes and its own file format and query optimizations. When running SNAP, all queries submitted to SNAP go through the SNAP thrift server and is managed and optimized by the SNAP engine. SNAP is accessed through the SNAP thrift server which is a thin layer on top of the Spark Thrift server. So any B.I tool that can connect to Spark Thriftserver can connect to SNAP

SparklineData Website

Product Overview

Overview

Concepts and Use cases

What problems does SNAP solve?

What is not intended for?

Who uses SNAP?

Key concepts

Qube ( Logical data model and metadata)

What is an OLAP Index?

Why is it fast?

What does it take to deploy SNAP ?

How is SNAP different from Spark ?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally