Back to the Bare(Metal) Basics

By Golda Velez and Peter Ani · LinkedTrust.us · March 2026

AI changes everything. Moore's law has always been bullshit anyway, but LLM hunger for cycles and fast retrieval makes it operate in reverse: software becomes hungrier much faster than hardware improvements can keep up. Responsive agentic AI needs fast RAG. And look at this sheer speed:

pgbench read-write TPS comparison: bare metal vs cloud

We came to this configuration from a few directions: back in 2024 one of our team went to a presentation by the USearch folks and were impressed by the chutzpah of going all the way down to rewriting the UTF-8 string handling libraries to squeeze every drop of speed out of a vector store. Unfortunately this radical improvement wasn't maintained, so we wound up just doing pgvector instead of Ustore. On a direct NVMe solid state drive this is enough for a 3x improvement over EBS, which is a form of network file system — ugh! And nearly 10x faster than RDS where most default RAG storage goes with a vanilla AWS setup.

You do not have to deeply understand networked file systems to know that a 10x speedup with cost savings is desirable. The rest of this article describes both our benchmarking and the specific configuration choices we made (proxmox over nix, shared team development VM with shared CLAUDE.md files and context), but without knowing any of that the decision to go to baremetal is the important one. AI not only changes the demand landscape, it reduces the cost of complexity management. It is time for bare metal to shine.

The Before Picture

GOLDA QUOTE — What were you running on (DigitalOcean, scattered services), what was the monthly bill, what was the pain point.

PETER QUOTE — What did the DigitalOcean setup look like from an engineering perspective? How many droplets/managed DBs? What was annoying about managing it?

Why Bare Metal, Why Now

GOLDA — What made you pull the trigger on bare metal vs just optimizing the cloud setup. The cost story, the control story, the AI-era story (pgvector, embeddings, Claude Code — these workloads need real hardware).

The Setup

We run a single dedicated server from Rackdog, a bare metal hosting provider based in the US. Intel Xeon Gold 6142, 128GB RAM, Samsung PM983 1.92TB NVMe. Proxmox VE as hypervisor. Four VM roles:

Architecture diagram: Proxmox host with Postgres, Dev, Platform, and App VMs on internal bridge, NVMe passthrough to Postgres

VM	Purpose	RAM	Cores
Postgres (100)	Shared PG 15 + pgvector, NVMe passthrough	32GB	8
Dev (200)	Team sandbox, Claude Code	16GB	8
Platform (300)	Static frontends (Vercel replacement)	8GB	4
App VMs (400+)	Per-app isolation, various stacks	~68GB avail	varies

RACKDOG QUOTE (Brian Fair) — Optional. Something about the kind of workloads they see customers running on bare metal, or why NVMe passthrough is underused, or the value prop vs cloud. Ask Brian.

Key Decision: NVMe Passthrough

PETER QUOTE — Why PCIe passthrough instead of letting Proxmox manage the NVMe as shared storage. What's the tradeoff. How IOMMU setup works in practice.

The Postgres VM gets direct PCIe access to the physical NVMe drive — no hypervisor storage layer in between. This is the single decision that drives most of the performance numbers below.

Key Decision: Postgres Tuning for NVMe

shared_buffers = 8GB
effective_cache_size = 24GB
work_mem = 64MB
maintenance_work_mem = 2GB
random_page_cost = 1.1        # NVMe: random reads nearly as fast as sequential
effective_io_concurrency = 200  # NVMe handles massive parallelism

PETER QUOTE — What tuning mattered most? What did you learn about PG on NVMe vs cloud EBS?

The Benchmarks

All benchmarks run against the Postgres VM — 32GB RAM, 8 cores, with direct NVMe passthrough. That's it. Not the full 128GB box, just one VM on it. The client ran from a separate VM on the same host over the internal bridge (sub-millisecond hop). PostgreSQL 15.15, pgvector 0.8.1. Every data point is a 30-second sustained run.

Standard PostgreSQL: pgbench Under Load

Scale factor 100 — 10 million rows, ~1.6GB working set. We swept from 1 to 64 concurrent clients to find the saturation point.

pgbench throughput and latency under load, 1 to 64 concurrent clients

Read-Write (TPC-B)

Clients	TPS	Avg Latency
1	532	1.88 ms
2	1,119	1.79 ms
4	3,036	1.32 ms
8	4,437	1.80 ms
16	6,366	2.51 ms
32	6,611	4.84 ms
48	6,410	7.49 ms
64	6,139	10.43 ms

Near-linear scaling up to core count, then a clean plateau. Peak: 6,611 TPS at 32 clients with latency under 5ms. That matches what AWS needs a 16 vCPU / 128GB instance to achieve.

Read-Only

Clients	TPS	Avg Latency
1	3,346	0.30 ms
4	19,111	0.21 ms
16	41,848	0.38 ms
32	40,558	0.79 ms
64	39,587	1.62 ms

42K TPS. Sub-millisecond latency up to 16 clients. Even at 64 concurrent connections: 1.6ms.

GOLDA OR PETER — One line reaction to these numbers.

Vector Search: pgvector with HNSW

100,000 vectors at 1,536 dimensions (OpenAI embedding size). HNSW index with m=16, ef_construction=64.

HNSW index build time: bare metal 2:56 vs Alibaba Cloud RDS 16:00

Index Build: 2 Minutes 56 Seconds

Alibaba Cloud RDS with 16 cores and 128GB takes 16 minutes for the same workload. 5.5x faster on half the hardware. Index builds are I/O-bound — NVMe passthrough dominates here.

Search Throughput Under Load

Clients	QPS	Avg Latency
1	318	3.1 ms
2	620	3.2 ms
4	1,008	4.0 ms
8	1,020	7.8 ms
16	1,045	15.3 ms
32	1,020	31.4 ms
64	928	68.9 ms

Sweet spot: 4–8 clients. Over 1,000 searches per second at under 8ms latency. Throughput holds above 900 QPS even at 64 concurrent clients.

How That Compares

Environment	vCPUs	RAM	QPS	Latency
Our bare metal	8	32GB	1,010	7.9 ms
Supabase 4XL	32	128GB	950	21 ms
Supabase XL	8	32GB	360	55 ms
Supabase Medium	2	8GB	240	83 ms
Alibaba Cloud RDS	16	128GB	102	16 ms

4x the throughput of Supabase at the same hardware class. 10x faster than Alibaba Cloud RDS at double our spec.

PETER QUOTE — Why pgvector on NVMe matters specifically for AI workloads. What the team actually uses vector search for.

The Cost

Monthly cost comparison: bare metal $355 vs AWS equivalent capacity ~$1,177

$355/month flat from Rackdog. No per-IOPS charges. No egress fees. No surprise bills. This single box replaces what would be 10+ cloud instances, a managed database, load balancers, and storage volumes across AWS, DigitalOcean, or Linode.

GOLDA QUOTE — The business case. What you do with the savings. How this changes what's possible for a small team.

Don't Need a Whole Box? $50/month.

Not every team needs 128GB of dedicated hardware. For $50/month, you get your own isolated VM on our infrastructure with direct access to the NVMe-backed Postgres — the same setup that produced these benchmarks. That includes:

	Shared VM ($50/mo)	AWS Equivalent
Compute	Dedicated VM, isolated	t3.small EC2: $15/mo
Database	NVMe Postgres + pgvector	RDS db.t3.micro: $13/mo (slow)
Storage	Included (NVMe-backed)	gp3 EBS: $12/mo
Vector search	1,000+ QPS at 8ms	Not available at this price
Claude Code	Direct SSH access, we help you set up	You figure it out
Egress	None	$0.09/GB
Total	$50/mo	$40-80/mo minimum

The cloud equivalent costs about the same — but runs on network-attached storage at a fraction of the performance. You're getting database performance that Supabase charges $400/month for.

And you get Claude Code access directly on the VM. We'll show you the workflow — how to use AI-assisted development against a real database on real hardware, not a sandbox with artificial limits. Modify your app, run migrations, test against production-class Postgres, all from the command line.

GOLDA — More on the shared VM pitch. What kind of customer is this for? Indie devs? Small startups? AI projects that need vector search?

What We Learned

PETER — Hardest part of the migration from DigitalOcean. What wasn't in any docs. What would you do differently.

PETER — What would you tell a team that's nervous about giving up managed services like RDS?

GOLDA — What surprised you most once it was running?

Who This Is For

GOLDA — The profile of who should seriously consider this. Not everyone — who specifically.

We Can Help

GOLDA — CTA paragraph. LinkedTrust does these migrations. Link to linkedtrust.us/services/baremetal-migration/

Methodology

All benchmarks ran on 2026-02-21. pgbench scale 100 (10M rows), 30-second runs per data point. pgvector: 100K random vectors at 1536 dimensions, HNSW m=16 ef_construction=64, ef_search=40, cosine distance. Client on Dev VM (200), server on Postgres VM (100), same physical host, internal bridge network.

All scripts, configs, benchmark runners, and operational docs are open source: github.com/Cooperation-org/barebox