お問い合わせ 無料で始める
web3bench-banner

This blog introduces Web3Bench, a hybrid transaction/analytical processing (HTAP) benchmark that addresses earlier limitations. Web3Bench is based on real-world Web3 use cases that utilize HTAP. Our data model is a simplified version of the decentralized blockchain Ethereum. We leverage a sample data set from Ethereum to build a scale factor-based data generator.

The workload in Web3Bench focuses on simple queries representing online processing with a high number of queries per second. We successfully verified Web3Bench on TiDB, an open-source distributed SQL database, as a proof of concept of the feasibility and applicability of the benchmark.

Why Do We Need a HTAP Benchmark?

Benchmarking serves as a yardstick for comparing software products with similiar objectives. The database industry has always embraced various benchmarks to measure different systems like online transaction processing (OLTP) and online analytical processing (OLAP). However, HTAP databases are becoming more popular in the database community.

The below diagram compares how traditional databases separate OLTP and OLAP vs. HTAP systems that process OLTP and OLAP in the same database with possibly a row store for OLTP and column store for OLAP.

The rise of HTAP databases also requires appropriate benchmarks for those systems. We believe existing HTAP benchmarks are not representative as transactional and analytical benchmarks simply glued together. More recent proposals did not address the online aspect of HTAP and re-used previous benchmark data models that do not represent HTAP workloads.

Getting Started with Web3Bench

Built upon the Web3 workload, a prevalent scenario for HTAP and a flourishing domain within decentralized databases, this benchmark distinguishes itself from its counterparts through the following features:

  1. It is based on Web3 business departing from outdated data models. Web3 needs HTAP databases since it requires (1) streamlined security and identity management that requires milliseconds latency and (2) complex analytics like predictive analytics and suggesting investment strategies.
  2. It covers HTAP workloads including Online Data Serving that lie in between OLTP and OLAP.

Data Model

One of the main objectives of Web3Bench is to bring a new and representative business problem for HTAP benchmarking and avoid re-cycling off artificially outdated data models. In this regard, we chose the decentralized blockchain Ethereum data model.

Ethereum original data is stored in a merkle tree data structure that is not a relational table by nature. The Google BigQuery team built a seven table relational schema from the Ethereum data model. We have simplified further on this BigQuery model to only four tables: Transactions, Contracts, Blocks, and Token_transfers. The diagram below shows more details on these tables and how they are related.

Data Generator

We designed and implemented a custom data generator for Web3Bench. The generator produces data for the four tables in the data model with a configurable scale factor. This enables testing various systems that can range from gigabytes (GBs) to petabytes (PBs).

The data generator relies on data demographics derived from the Ethereum sample data-set to make it more representative of the real world.

The demographics are as follows:

  1. Frequency (count distinct) of FK values in all the tables.
  2. (from_address, to_address) distinct pairs

The data generator is also multi-threaded in nature to quickly upload the data into the tables in an efficient way.

Read Workload

As discussed earlier, one of our contributions is to cover the full spectrum of HTAP analytics (i.e., real-time and OLAP queries) rather than just two extremes (which was done in an earlier benchmark).

We introduce a new type of metric workload section called online serving (which we’ll call O for short). The latency in this category is tunable from one to 10 seconds.

We also classified the query result measurement into the following categories:

  1. Queries with latency less than or equal to 0.1 seconds. (called T queries)
  2. Response time within a second (called O queries).
  3. Response time within 10 seconds (again called O queries).
  4. Lastly, no explicit latency requirements on a set of queries called A queries. Here throughput is the critical factor to distinguish the systems.

Queries

As mentioned earlier, we classified a set of queries into three categories called O, T, and A. Let us dive more into the queries.

T Queries

These are the queries which just retrieve a single transaction (also called point queries) since they hash a unique identifier for a particular transaction. The query template “R1” for these queries is:

SELECT to_address, from_address
FROM transactions
WHERE hash = ?
O Queries

These are the queries with latency less than or equal to 1 second. There are two query templates for this class of queries. Both of them retrieve a small list of transactions.

Template 1 “R21”

SELECT *
FROM token_transfer
WHERE from_address = ?
ORDER BY block_number DESC
LIMIT 5

Template 2 “R22”

SELECT count(*)
FROM token_transfer
WHERE token_address = ?

There is a further sub category of queries that have latency characteristics of <= 10 seconds.

Template 3 “R23”

SELECT *
FROM token_transfers 
WHERE token_address = ?
AND block_number <= ?
AND (next_block_number > ? or next_block_number = ?)
ORDER BY block_number DESC
LIMIT 5

Template 4 “R24”

SELECT count(*)
FROM transactions
WHERE to_address not in (?, ?, ?)

Template 4 “R25”

SELECT count(*) 
FROM token_transfers 
WHERE next_block_number <= block_number
GROUP BY next_block_number
A Queries

This category contains queries with no latency limitations. These are generally complex queries with a lot of computation and data movement.

There are five queries in this category, and you can find their descriptions below. The SQL queries can be found in this repository.

  1. R31: Find transactions where a person is either a sender or receiver, limit the result to the most recent timestamp.
  2. R32: Top N transactions based on block timestamp.
  3. R33: Find the number of unique senders (from_address) in transactions
  4. R34: Find Top N senders (from_address) by total transaction value
  5. R35: Total count of token transfers for a specific sender and token transfers for recipients who are also senders in other transactions.

Write Workload

Writes (W for short) are an important part of HTAP. Mostly are either single row inserts or inserts with small batches. The writes include inserts, deletes, and updates. The following is a high-level description of the write operations in Web3Bench.

  • Inserts
    • Single point inserts for four tables labeled as W11, W12, W13 and W14.
    • W2: Small batch inserts (100 rows) for the transaction table.
    • W3: Insert 1000 rows into transactions from a temp table.
  • Updates
    • W4: Single point update for the transaction table
    • W5: Join update for the token_transfers table.
  • Deletes
    • W6: Single record deletes for the transaction table.

We developed a driver to submit read (R) and write (W) requests over 60 minutes, varying their frequency. Complex reads (R3 queries) and writes (batch inserts, join updates) occur once. These figures are averages from analyzing TiDB’s Web3 customers, but We3Bench allows researchers to adjust frequencies to their preferences. Request frequencies are shown in the table’s fourth column. 70% of the workload comprises read requests (R1-R35), while writes (W1-W6) constitute 30%. R1 queries are the most common, followed by O queries with 1-second and 10-second limits. Single-row inserts dominate writes.

For our workload, we built a driver that (1) submits the above reads and writes concurrently and (2) collects the response time of each request. There is a pass/fail decision based on the latency requirements for T, O, and single-point writes. Additionally, the driver computes simple metrics like average latency per request type and total latencies for the whole workload.

HTAP Benchmark Testing: Experimenting with TiDB

Numerous systems can handle HTAP workloadss. However, TiDB stands out as a significant contender, as it can cater to the entire spectrum of T, O, and A use cases. We chose TiDB as the testing platform for Web3Bench due to its open-source nature, simple setup process, and compatibility with MySQL. It’s important to note that the objective of the test isn’t to evaluate TiDB’s performance, but rather to demonstrate the feasibility and relevance of Web3Bench.

We produced a data set with scalefactor = 6000 that translates to around 500GB raw data for Web3Bench. Our index advisor was used to create appropriate indexes that helped with the performance of some of the queries. We also had TiFlash, the columnar storage component for TiDB, as part of the system configuration to handle the OLAP benchmark.

The table below shows the system configuration we used for the test. We used three TiDB servers that are stateless nodes responsible for receiving, parsing, optimization, and plan generation for client requests. We also used three TiKV and two TiFlash nodes for the row and column stores (respectively). Note that TiFlash requires more hardware resources since it handles more complex queries than TiKV.

CountNumber of CPU (per node)Memory (per node)Storage (per node)
TiDB316128 GB/
TiKV31664 GB1 TB
TiFlash264128 GB1 TB

Exploring HTAP Benchmark Testing

Column 2 in the results table below illustrates which storage and compute is applied to each test component. OLTP mainly uses row store (TiKV), online serving uses both TiKV, and TIFlash and OLAP mainly rely on TiFlash. Writes trigger updates to TiKV for raw data and indexes and also gets propagated to TiFlash.

The third column (Requests) lists the detailed read and write requests described above in the workload description. The request frequency (i.e., how many times a request was executed) is displayed in the fourth column. Overall, number of read requests (R1 through R35) constitute 70% of the workload vs. 30% for writes (W11 through W6). R1 queries rank as the most common, trailed by O queries limited to 1 second, and then by O queries with a 10-second limit. For writes, single-row inserts, updates, and deletes are the most common.

Measuring HTAP Benchmark Performance

We measured performance by average latency (i.e., the 5th column). To illustrate the row and column store optimizations, we ran the benchmark without indexes (i.e., the 6th column) and without TiFlash (i.e., the last column). Overall, the lack of indexes and column store optimizations caused 10X and 7X regressions, respectively (see total latencies for each of the three tests). The test with no indexes had no impact on R1, while R21, R22 and R32 degraded. The test without TiFlash had a degradation of R24, R25, R31, R33, R34 and R35.

For writes, we also ran tests to measure the effect of indexes and TiFlash on writes, which are supposed to incur overhead. The improvements of writes with no indexes and no TiFlash are far less than the benefit of these structures for reads. In summary, our experiments show that Web3Bench is a good representative benchmark for HTAP with a clear impact of optimizations in both row store and column store across the workloads.

The Results

Seconds
Workload ComponentTiDB EngineRequestsRequest frequencyAverage latencyAverage latency no indexesAverage latency no columnar
OLTPTiKVR15,760,0000.000610.000610.00061
OnlineTiKV and TiFlashR2157,6000.00150.660.0015
R2257,6000.0620.740.062
R2357,6000.00370.510.0037
R2410003.043.0465.24
R2510001.861.8614.10
OLAPTiFlashR3110.00640.006418.70
R3210.1647.400.16
R3312.992.99369.30
R3415.815.811,358.10
R35112.2012.20249.30
Total5,934,80512,261118,46688,707
WriteTiKV and TiFlashW11288,0000.0000790.0000750.000063
W12288,0000.0001080.0000820.000065
W13288,0000.0001080.0000850.000067
W14288,0000.0001090.0000890.000062
W210.003230.00280.00212
W310.05430.04580.038
W4288,0000.00280.00260.0022
W51115.1014.0012.00
W5211070.40974.00966.20
W6288,0000.000710.000630.00058
Total1,728,0042,2132,0141,853
Grand Total14,474120,48090,560

You can find the benchmark here. The repository has clear instructions on prerequisites, TiDB setup, loading data (with desired scale factor), and how to run the benchmark.

Conclusion

This post introduced Web3Bench, an innovative HTAP benchmark inspired by the growing Web3 business paradigm. This benchmark encompasses read operations, ranging from real-time interactions to online serving and batch processing. On the write side, it involves tasks such as inserts, updates, and deletes, each characterized by varying levels of complexity.

Finally, the authors express gratitude to Hanumath Maduri from Workday, Zhongxin Ge, Guoxin Kang, and Ryan Zhang for their major contributions to this benchmark’s design and implementation.

For more details, please check out our white paper, Blitzscaling Your Web3 Business with a HTAP Database, to explore how other Web3 companies have dealt with their modern data challenges.


Have questions? Let us know how we can help.

Contact Us

TiDB Cloud Dedicated

TiDB Cloudのエンタープライズ版。
専用VPC上に構築された専有DBaaSでAWSとGoogle Cloudで利用可能。

TiDB Cloud Serverless

TiDB Cloudのライト版。
TiDBの機能をフルマネージド環境で使用でき無料かつお客様の裁量で利用開始。