We needed a second VPS provider. Our hosting stack — Docker, Traefik, Umbraco CMS, Gotenberg for PDF generation — runs multiple client sites as isolated containers on a single server. We had been running on a Hetzner Cloud CPX21 out of Ashburn without complaint. But when Raff Technologies showed up offering roughly double the resources at half the price, we did what we do with every vendor claim: we tested it.
What started as a quick sanity check turned into a full-day evaluation that changed our conclusion three times and exposed real flaws in how most people benchmark VPS providers.
This is not a post about which provider is better. It is a guide to benchmarking methodology — what we got wrong, what we learned, and why surface-level tests can lead you to the wrong decision.
The Baseline: Same Script, Two Servers
We wrote a bash script covering the standard toolkit: sysbench for CPU and memory, fio for disk I/O, curl for network throughput, Docker-specific operations, and a sustained load test to catch throttling. Identical script, identical parameters, both servers running Ubuntu 24.04 LTS.
The servers under test:
Raff Technologies | Hetzner Cloud CPX21 | |
|---|---|---|
CPU | 4 vCPU (EPYC 8224P) | 3 vCPU (EPYC Rome) |
RAM | 8 GB | 3.7 GB |
Disk | 120 GB NVMe | 75 GB NVMe |
Monthly Cost | $9.99 | ~$19 |
One operational note before the numbers: Raff's default Ubuntu image was 24.10, which is already end-of-life. The apt repositories were dead, so nothing installed. They confirmed 24.04 LTS was available, we rebuilt, and moved on. If you are evaluating Raff, pick the LTS image.
Lesson 1: The First Run Is Rarely the Whole Story
Round 1 told a clean, simple story:
CPU: Hetzner faster per-core (+7%), Raff faster in aggregate (+28%) on the extra core
Memory: Wash
Disk I/O: Hetzner dominated — 2.6x faster sequential reads, 46% more random 4K IOPS
Network: Hetzner at 8,138 Mbps versus Raff at 754 Mbps — a 10x gap
If we had stopped here, we would have written Raff off as a staging box. We almost did.
Instead, we shared the full report with Raff. Their response changed the trajectory of the evaluation.
Lesson 2: The Best Providers Engage Technically
Within minutes, Raff's support team responded — not with marketing rebuttals, but with server-side storage adjustments. They asked us to re-run.
Sequential read performance jumped from 524 MB/s to 1,631 MB/s. A 211% improvement that flipped the metric from a Hetzner win to a Raff win.
Random 4K IOPS did not change. We said so. They accepted it and pointed us toward a deeper test.
The takeaway for any vendor evaluation: share your data with the provider. The ones worth working with will engage technically. The ones who point you toward an FAQ are telling you something about what support looks like post-purchase.
Lesson 3: Benchmark the Workload You Actually Run
This was the expensive lesson.
Our fio tests ran at iodepth=1 — one I/O request in flight at a time. That measures single-request latency, which is useful for isolated sequential operations. But our production workload is not isolated or sequential. We run multiple Docker containers, each with its own SQLite database, all reading from disk concurrently. That is a parallel I/O workload, and NVMe drives are designed for deep queue parallelism.
Raff suggested we test at iodepth=16 with four parallel jobs. We were skeptical — it felt like moving the goalposts. But the logic was sound: test the pattern your servers actually experience.
We ran the deep queue test on both servers:
Provider | iodepth=1 (4K Read) | iodepth=16 (4K Read) | Scaling Factor |
|---|---|---|---|
Raff | 2,694 IOPS | 83,160 IOPS | 31x |
Hetzner | 3,933 IOPS | 46,676 IOPS | 12x |
At realistic concurrency, Raff delivers 78% more random read IOPS at nearly half the latency (0.76 ms versus 1.37 ms). The server that looked slower in a single-threaded test was dramatically faster under parallel load.
If you run containers or databases and you are benchmarking with iodepth=1, you are measuring the wrong thing.
Lesson 4: Single-Stream Downloads Measure TCP, Not Bandwidth
Our initial network test — a single-stream curl download — showed Hetzner at roughly 10x Raff's throughput. That looked disqualifying.
Raff pointed out the test was bottlenecked by TCP window scaling and geographic distance to the test server, not by actual bandwidth capacity. Multi-stream testing with speedtest-cli told a different story:
Raff | Hetzner | |
|---|---|---|
Download | 2,279 Mbps | 1,566 Mbps |
Upload | 1,910 Mbps | 1,458 Mbps |
Ping | 3.94 ms | 10.04 ms |
The 10x gap was a testing artifact. Raff was actually 46% faster on multi-stream download to nearby U.S. infrastructure.
If your network benchmark uses a single TCP stream, you are testing TCP window scaling behavior, not your provider's pipe.
What the Numbers Mean in Practice
For our stack, the capacity math looks like this. Each Umbraco 13 container with SQLite consumes roughly 300–400 MB of RAM:
Plan | Monthly Cost | Estimated Sites | Cost per Site |
|---|---|---|---|
Raff $9.99 (4 vCPU / 8 GB) | $9.99 | ~18 | ~$0.55 |
Raff $23.99 (8 vCPU / 16 GB) | $23.99 | ~38 | ~$0.63 |
Hetzner CPX21 (3 vCPU / 3.7 GB) | ~$19 | ~7 | ~$2.71 |
At the $23.99 tier, we host 5x more sites for $5 more per month. For nonprofit clients where every dollar in overhead matters, that changes the math entirely.
The Intangibles That Do Not Show Up in fio
Support responsiveness. Raff made same-day infrastructure adjustments based on our data. Every technical suggestion they offered was sound and reproducible. That kind of engagement is what you get from small, technical providers — and it is worth more than a few percentage points on a benchmark.
Data residency options. Raff indicated they would stand up Canadian-domiciled servers on request. Several of our clients have data residency requirements Hetzner cannot serve from Ashburn. For compliance-sensitive workloads, server location is not a nice-to-have.
Honest engagement. They did not dispute our findings when the numbers were unfavorable. They fixed what they could, explained what they could not, and pointed us toward better methodology. That builds trust faster than any SLA document.
A Benchmarking Checklist for VPS Evaluations
If we learned one thing from this process, it is that default benchmarks can actively mislead. Here is what we would recommend for anyone evaluating providers:
Test at realistic queue depths. If you run containers or databases, iodepth=1 undersells NVMe storage. Test at iodepth=16 with multiple parallel jobs.
Use multi-stream bandwidth tests. Single-stream curl downloads measure TCP behavior, not bandwidth capacity. Use speedtest-cli or iperf3 with multiple streams.
Run sustained load tests. Noisy neighbors and throttling do not show up in 10-second bursts. Run at least 60 seconds across multiple windows.
Share your results with the provider. The response tells you more about the relationship than the numbers do.
Test the workload you actually run. Generic benchmarks are a starting point, not an answer.
Re-test after provider adjustments. Infrastructure is not static. A provider willing to tune based on real data is a provider worth keeping.
The Decision
We are moving new Umbraco 13 deployments to Raff Technologies, starting at the $9.99 tier with a clear upgrade path to $23.99 as we onboard more clients. Hetzner continues running existing workloads while we migrate.
The benchmark script and the full technical report are both available below.
Download the benchmark script →
Download the full technical report →