TPC Express Big Bench (or TPCx-BB) is a benchmark that was developed in order to objectively compare Big Data Analytics System (BDAS) solutions. SQream’s big data analysts ran an internal field test derived from the TPCx-BB in September 2021 to understand its performance in comparison to leading cloud analytics solutions (like Amazon and Google). For more information regarding TPCx-BB, please see the official TPC website.
Platforms Analyzed
SQream (currently running only on private cloud), Google BigQuery, Amazon Redshift, Snowflake.
Scale Factor
We ran the benchmark with a scale factor of 30,000, which creates a dataset of ~30TB, as SQream was designed to handle large datasets.
Hardware Used
The main consideration for customizing the hardware stack for each one of the competing vendors was the right balance between cost and performance. Obviously, we took into account each vendor’s recommendation depending on the size of the chosen dataset (30TB) and maintained an equal number of nodes for all participants.
Environment
Configuration
Compute cost (hour)
Storage cost
(TB)
Amazon Redshift
AWS
8X ra3.4xlarge
$26.08
$24
Snowflake
AWS
Large
$16.00
$40
(on-demand)
SQream
AWS
8X g4dn.8xlarge
$17.4
$23
Google BigQuery
GCP
Flat-rate 400 slots
$16.00
$20
Snowflake
GCP
Large
$16.00
$46
(on-demand)
SQream
GCP
4x nl-standard-32
(with additional 2-GPU each)
$16.88
$20
Running the Field Test
After configuring the chosen cloud environment for the field test and generating the 30TB dataset, we were ready to begin. Out of the 30 queries included on the TPCx-BB, we tested only 18 use cases as a reflection of the functionalities that were supported by SQream’s platform as of September 2021. Those queries were 5, 6,7,9,11-17, 20-26. As we were running the different use cases, we focused on two metrics for comparison:
Performance:
Ingestion – time elapsed during the process of transporting the data from its source to the DB / DWH.
Query – time elapsed during the process of executing the 18 queries (using concurrent streams, aka ‘Throughput Test’).
Total Time To Insight (TTTI) – Ingestion + Query.
Cost:
Storage – the cost of storing the compressed data on the relevant cloud vendor service ($/TB).
Compute – the cost of resources used to ingest the raw data from its sources and complete the 18 queries ($/Hour).
The Results
The following chart shows the overall performance of each platform for the given workload, in terms of total time for Ingestion and Query in the TPCx-BB field test:
TPCx-BB 30TB Benchmark – Performance HH:MM (lower is better)
The results revealed several performance differentiators between the competing products. Overall, in both cloud environments, SQream presented the best TTTI, between X1.5 to X9.5 faster. As for the average execution time of the 18 queries, SQream presented between 1.7X to 4.6X faster results (212 seconds on AWS and 197 seconds on GCP). Even when segmenting the results into more specific use cases or data types, SQream maintained its advantage:
Query time performance (MM:SS) – per data type (lower is better)
Query time performance (MM:SS) – per use case (lower is better)
Even though the computing cost of machines with GPUs (which is SQream’s case) is usually much higher, the outstanding performance of SQream during the field test staging showed it to be the most cost-effective option:
TPCx-BB 30TB Benchmark – Cost (lower is better)
Learn more about how SQream performed in other benchmarks, such as TPC-H (10TB), TPCx-BB (300TB), and other industry use cases.
The post SQream’s On-Cloud Performance with TPCx-BB 30TB Benchmark appeared first on SQream.