Why sysbench‑tpcc results on outdated hardware should not be presented as a valid OLTP vendor comparison

• the exact sysbench‑tpcc version
• the full command lines
• the dataset size and warehouse count
• the configuration of each server
• the tuning applied
• the environment setup
• the run duration and warmup
• the isolation level and transaction settings
• the storage configuration
• the NUMA layout
• the OS parameters

A benchmark is only credible when others can reproduce it. The blog does not provide a clear, consolidated description of:

Percona’s own documentation makes clear that sysbench-tpcc is not an implementation of TPC-C. It is TPCC-like, does not respect keying time, runs as a closed-loop contention test on a fixed dataset, and does not follow multiple TPC-C requirements. Percona explicitly warns not to use sysbench-tpcc for vendor comparison.

Sysbench itself is a critical tool in database performance engineering. I have relied on it for more than twenty years for deterministic benchmarking, regression detection, and isolating performance characteristics. Its single-function workloads such as point-select, oltp_ro, and oltp_rw are excellent for valid cross-vendor comparisons when used correctly.

That framing is accurate, and it highlights the issue.

Benchmark clarity matters. Reproducibility matters. Methodology matters. Without them, results can be easily misinterpreted, and readers can draw conclusions that the data does not justify.

In the same post, the author explains why they normally publish TPC-C results: because TPC-C is closer to real OLTP usage, includes relational structure, foreign keys, constraints, and a balanced read/write mix, and follows the 80 percent rule for realistic coverage. They correctly describe sysbench single-function tests as diagnostic tools, not wide-scenario OLTP workloads.

sysbench-tpcc, however, is a different workload with very different properties.

The Percona post may contain valid numbers for the specific environment tested, but the way the results are presented is misleading. The combination of outdated hardware, a non‑TPC‑C workload, and unclear methodology creates the impression of a valid OLTP vendor comparison when the underlying test does not support that conclusion.

If the post had used sysbench point-select, oltp_ro, or oltp_rw, those would have been valid and comparable. But an I/O-bound sysbench-tpcc run is not representative of OLTP behavior and should not be presented as such.

Instead, these details are scattered, implied, or missing entirely. Reproducing the results requires digging through multiple sections, inferring settings, and guessing about defaults. That lack of clarity makes the results non‑reproducible, and therefore not suitable for vendor comparison.

2. sysbench-tpcc is not a standards-based OLTP benchmark and should not be used for vendor comparison

https://github.com/Percona-Lab/sysbench-tpcc

Conclusion

The test system used in the post is several generations behind current server hardware. Modern database engines are optimized for newer CPU architectures, memory bandwidth, storage layers, and NUMA characteristics. Running current releases on old hardware produces behavior that does not reflect how these engines perform today.
Without clearly stating this limitation up front, readers can easily misinterpret the results as representative of modern OLTP performance. They are not.

1. Outdated hardware creates a misleading impression of modern OLTP behavior

These are not flaws in sysbench. They are simply characteristics of this particular workload. sysbench-tpcc is more a stress test, not a standards-based OLTP benchmark, and results from it cannot be interpreted as vendor-level OLTP conclusions.

Even if the author chose to dig into sysbench tests for this post, that does not change the nature of sysbench-tpcc. By the author’s own logic, TPC-C is the realistic OLTP workload. sysbench-tpcc remains a single-function-style stress tool with table-set behavior, no tpmC metric, no scaled vs cached mode, and client-side SQL bottlenecks. It is not a valid basis for cross-vendor OLTP comparison.

3. The blogger’s own framing reinforces the inconsistency

Benchmark results only have meaning when the workload, hardware, and methodology are clearly defined and reproducible. When those elements are unclear or incomplete, the conclusions can easily mislead readers into assuming the results represent something they do not. 

The problem is that the post presents the results in a way that implies a valid OLTP vendor comparison, while the underlying methodology and hardware make such a comparison impossible to support.

Beyond those documented limitations, there are additional technical constraints that make sysbench-tpcc unsuitable for OLTP vendor comparison:

That is the core issue with the recent Percona post comparing MySQL, Percona Server, and MariaDB.

This is not about disputing Percona’s numbers. Their results may be valid for their environment. 

• It uses table-sets. Running 10 x 100 warehouse schemas is not equivalent to running 1 x 1000 warehouses. It does not test scalability in the way TPC-C, DBT-2, or TPROC-C do.
• By default it exercises all warehouses, similar to the HammerDB “use all warehouses” option. It does not provide a way to test cached vs scaled workloads.
• It does not measure tpmC. More operations per second does not translate to more completed transactions.
• It uses client-side SQL rather than stored procedures. As systems scale, the client-server round trips become a bottleneck, distorting results on larger hardware.

– It is not an implementation of the TPC‑C workload.
– It is TPCC‑like, using only queries and schemas from the specification.
– It does not respect keying time.
– It runs as a closed‑loop contention test on a fixed dataset.
– It does not follow multiple TPC‑C requirements.
– And Percona explicitly warns: do not use sysbench‑tpcc to generate TPC‑C results for comparing vendors.

4. Reproducing the results requires digging through scattered details

Similar Posts