the gwenchmarks manifesto
in defense of honest measurement, skeptical inquiry, and fewer benchmarketing crimes
preamble
When in the course of software events it becomes necessary to compare one database, system, or design against another, it is too often the custom of our industry to summon charts without context, numbers without discipline, and conclusions without shame. Thus are benchmarks turned from instruments of illumination into engines of confusion.
Yet a benchmark, rightly constructed, is a thing of considerable virtue. It may reveal the temper of a system under stress, the shape of a workload, the hidden tax of a design decision, or the bargain struck between latency, throughput, and cost. It may instruct the curious, warn the incautious, and provide engineers with grounds more solid than opinion or vibes.
Therefore Gwenchmarks is established for a plain and honorable purpose: to furnish every developer with the tools and habits required to run meaningful benchmarks, to interpret the resulting evidence with care, and to resist the glittering frauds of benchmarketing nonsense.
articles of intent
We hold that a benchmark ought to answer a real question, not merely flatter a favored product. We hold that workloads must resemble the world they claim to represent. We hold that warm caches, peculiar hardware, silent failures, omitted caveats, and selective graphs are not harmless omissions but forms of misdirection.
We reject the doctrine that bigger numbers alone are wisdom. We reject the convenient concealment of setup, variance, limitations, and tradeoffs. We reject the notion that a single heroic chart may stand in place of method.
In their stead, Gwenchmarks shall aspire to clarity over spectacle, repeatable method over theater, and understanding over victory laps. The object is not to crown winners by proclamation, but to help engineers discover what is true, what is uncertain, and what would need to change for the result to differ.
what shall be built here
Here shall be assembled a free and open course on the practice of benchmarking databases: how to design experiments, how to control conditions, how to measure tails as well as averages, how to recognize misleading methodology, and how to publish conclusions without committing statistical vandalism.
We further affirm that not every worthy benchmark must be grand in scale or ruinous in effort. There is great value in the modest experiment, carefully bounded and swiftly executed, which answers a single material question: whether one index serves better than another, whether a schema choice exacts a tax, whether a workload improves or degrades under a proposed change. Such experiments may often be run in an afternoon, and they are no less noble for their brevity.
Neither shall we neglect the larger trials: the full suites, the broader workload models, the elaborate preparations by which a system is examined in more complete measure. These have their proper place. Gwenchmarks shall teach developers when a narrow experiment is sufficient, when a more comprehensive benchmark is demanded, and how to interpret each without fear, confusion, or misplaced certainty.
If this work prospers, then fewer developers shall be bullied by glossy dashboards, fewer teams shall purchase on the strength of benchmark theater, and more people shall be equipped to ask the only question that matters: what, precisely, does this result mean?
gwenchmarks principles:
- measure what matters
- fit the benchmark to the question
- reveal the setup
- name the tradeoffs
- distrust easy wins
- publish enough detail to be challenged
- never confuse marketing with science