AI changes everything. Moore's law has always been bullshit anyway, but LLM hunger for cycles and fast retrieval makes it operate in reverse: software becomes hungrier much faster than hardware improvements can keep up. Responsive agentic AI needs fast RAG. And look at this sheer speed:
We came to this configuration from a few directions: back in 2024 one of our team went to a presentation by the USearch folks and were impressed by the chutzpah of going all the way down to rewriting the UTF-8 string handling libraries to squeeze every drop of speed out of a vector store. Unfortunately this radical improvement wasn't maintained, so we wound up just doing pgvector instead of Ustore. On a direct NVMe solid state drive this is enough for a 3x improvement over EBS, which is a form of network file system — ugh! And nearly 10x faster than RDS where most default RAG storage goes with a vanilla AWS setup.
You do not have to deeply understand networked file systems to know that a 10x speedup with cost savings is desirable. The rest of this article describes both our benchmarking and the specific configuration choices we made (proxmox over nix, shared team development VM with shared CLAUDE.md files and context), but without knowing any of that the decision to go to baremetal is the important one. AI not only changes the demand landscape, it reduces the cost of complexity management. It is time for bare metal to shine.
We run a single dedicated server from Rackdog, a bare metal hosting provider based in the US. Intel Xeon Gold 6142, 128GB RAM, Samsung PM983 1.92TB NVMe. Proxmox VE as hypervisor. Four VM roles:
| VM | Purpose | RAM | Cores |
|---|---|---|---|
| Postgres (100) | Shared PG 15 + pgvector, NVMe passthrough | 32GB | 8 |
| Dev (200) | Team sandbox, Claude Code | 16GB | 8 |
| Platform (300) | Static frontends (Vercel replacement) | 8GB | 4 |
| App VMs (400+) | Per-app isolation, various stacks | ~68GB avail | varies |
The Postgres VM gets direct PCIe access to the physical NVMe drive — no hypervisor storage layer in between. This is the single decision that drives most of the performance numbers below.
shared_buffers = 8GB
effective_cache_size = 24GB
work_mem = 64MB
maintenance_work_mem = 2GB
random_page_cost = 1.1 # NVMe: random reads nearly as fast as sequential
effective_io_concurrency = 200 # NVMe handles massive parallelism
All benchmarks run against the Postgres VM — 32GB RAM, 8 cores, with direct NVMe passthrough. That's it. Not the full 128GB box, just one VM on it. The client ran from a separate VM on the same host over the internal bridge (sub-millisecond hop). PostgreSQL 15.15, pgvector 0.8.1. Every data point is a 30-second sustained run.
Scale factor 100 — 10 million rows, ~1.6GB working set. We swept from 1 to 64 concurrent clients to find the saturation point.
| Clients | TPS | Avg Latency |
|---|---|---|
| 1 | 532 | 1.88 ms |
| 2 | 1,119 | 1.79 ms |
| 4 | 3,036 | 1.32 ms |
| 8 | 4,437 | 1.80 ms |
| 16 | 6,366 | 2.51 ms |
| 32 | 6,611 | 4.84 ms |
| 48 | 6,410 | 7.49 ms |
| 64 | 6,139 | 10.43 ms |
Near-linear scaling up to core count, then a clean plateau. Peak: 6,611 TPS at 32 clients with latency under 5ms. That matches what AWS needs a 16 vCPU / 128GB instance to achieve.
| Clients | TPS | Avg Latency |
|---|---|---|
| 1 | 3,346 | 0.30 ms |
| 4 | 19,111 | 0.21 ms |
| 16 | 41,848 | 0.38 ms |
| 32 | 40,558 | 0.79 ms |
| 64 | 39,587 | 1.62 ms |
42K TPS. Sub-millisecond latency up to 16 clients. Even at 64 concurrent connections: 1.6ms.
100,000 vectors at 1,536 dimensions (OpenAI embedding size). HNSW index with m=16, ef_construction=64.
Alibaba Cloud RDS with 16 cores and 128GB takes 16 minutes for the same workload. 5.5x faster on half the hardware. Index builds are I/O-bound — NVMe passthrough dominates here.
| Clients | QPS | Avg Latency |
|---|---|---|
| 1 | 318 | 3.1 ms |
| 2 | 620 | 3.2 ms |
| 4 | 1,008 | 4.0 ms |
| 8 | 1,020 | 7.8 ms |
| 16 | 1,045 | 15.3 ms |
| 32 | 1,020 | 31.4 ms |
| 64 | 928 | 68.9 ms |
Sweet spot: 4–8 clients. Over 1,000 searches per second at under 8ms latency. Throughput holds above 900 QPS even at 64 concurrent clients.
| Environment | vCPUs | RAM | QPS | Latency |
|---|---|---|---|---|
| Our bare metal | 8 | 32GB | 1,010 | 7.9 ms |
| Supabase 4XL | 32 | 128GB | 950 | 21 ms |
| Supabase XL | 8 | 32GB | 360 | 55 ms |
| Supabase Medium | 2 | 8GB | 240 | 83 ms |
| Alibaba Cloud RDS | 16 | 128GB | 102 | 16 ms |
4x the throughput of Supabase at the same hardware class. 10x faster than Alibaba Cloud RDS at double our spec.
$355/month flat from Rackdog. No per-IOPS charges. No egress fees. No surprise bills. This single box replaces what would be 10+ cloud instances, a managed database, load balancers, and storage volumes across AWS, DigitalOcean, or Linode.
Not every team needs 128GB of dedicated hardware. For $50/month, you get your own isolated VM on our infrastructure with direct access to the NVMe-backed Postgres — the same setup that produced these benchmarks. That includes:
| Shared VM ($50/mo) | AWS Equivalent | |
|---|---|---|
| Compute | Dedicated VM, isolated | t3.small EC2: $15/mo |
| Database | NVMe Postgres + pgvector | RDS db.t3.micro: $13/mo (slow) |
| Storage | Included (NVMe-backed) | gp3 EBS: $12/mo |
| Vector search | 1,000+ QPS at 8ms | Not available at this price |
| Claude Code | Direct SSH access, we help you set up | You figure it out |
| Egress | None | $0.09/GB |
| Total | $50/mo | $40-80/mo minimum |
The cloud equivalent costs about the same — but runs on network-attached storage at a fraction of the performance. You're getting database performance that Supabase charges $400/month for.
And you get Claude Code access directly on the VM. We'll show you the workflow — how to use AI-assisted development against a real database on real hardware, not a sandbox with artificial limits. Modify your app, run migrations, test against production-class Postgres, all from the command line.
All benchmarks ran on 2026-02-21. pgbench scale 100 (10M rows), 30-second runs per data point. pgvector: 100K random vectors at 1536 dimensions, HNSW m=16 ef_construction=64, ef_search=40, cosine distance. Client on Dev VM (200), server on Postgres VM (100), same physical host, internal bridge network.
All scripts, configs, benchmark runners, and operational docs are open source: github.com/Cooperation-org/barebox