Skip to content
This repository was archived by the owner on Mar 10, 2025. It is now read-only.

Query Test Runs

Denny Lee edited this page Mar 11, 2017 · 6 revisions

Below are the results of some query test runs using the different Spark to DocumentDB connector methods.

Performance: Single Spark VM

Below are the results of connecting Spark to DocumentDB via pyDocumentDB with the following configuration:

  • Single VM Spark cluster (one master, one worker) on Azure DS11 v2 VM (14GB RAM, 2 cores) running Ubuntu 16.04 LTS using Spark 2.1.
  • DocumentDB single partition collection configured to 10,000 RUs
  • airport.codes has 512 documents
  • DepartureDelays.flights has 1.05M documents (single collection)
  • DepartureDelays.flights (pColl) has 1.39M documents (partitioned collection)

The queries were:

  • Q1: SELECT c.City FROM c WHERE c.State='WA'
  • Q2: SELECT TOP 100 c.date, c.delay, c.distance, c.origin, c.destination FROM c
  • Q3: SELECT c.date, c.delay, c.distance, c.origin, c.destination FROM c WHERE c.origin = 'SEA'
  • Q4: SELECT c.date, c.delay, c.distance, c.origin, c.destination FROM c

Single Collection

Below are the results from querying a single collection

Query # of rows Collection Response Time (First) Response Time (Second)
Q1 7 airport.codes 0:00:00.225645 0:00:00.006784
Q2 100 DepartureDelays.flights 0:00:00.214985 0:00:00.009669
Q3 14,808 DepartureDelays.flights 0:00:01.498699 0:00:01.323917
Q4 1,048,575 DepartureDelays.flights 0:01:37.518344

Partitioned Collection

Below are the results from querying a partitioned collection (25 partitions)

Query # of rows Collection Response Time (First) Response Time (Second)
Q2 100 DepartureDelays.flights (pColl) 0:00:00.774820 0:00:00.508290
Q3 23,078 DepartureDelays.flights (pColl) 0:00:05.146107 0:00:03.234670
Q4 1,391,578 DepartureDelays.flights (pColl) 0:02:36.335267

Clone this wiki locally