Back to the Bare(Metal) Basics

By Golda Velez and Peter Ani · LinkedTrust.us · March 2026

AI changes everything. Moore's law has always been bullshit anyway, but LLM hunger for cycles and fast retrieval makes it operate in reverse: software becomes hungrier much faster than hardware improvements can keep up. Responsive agentic AI needs fast RAG. And look at this sheer speed:

pgbench read-write TPS comparison: bare metal vs cloud

We came to this configuration from a few directions: back in 2024 one of our team went to a presentation by the USearch folks and were impressed by the chutzpah of going all the way down to rewriting the UTF-8 string handling libraries to squeeze every drop of speed out of a vector store. Unfortunately this radical improvement wasn't maintained, so we wound up just doing pgvector instead of Ustore. On a direct NVMe solid state drive this is enough for a 3x improvement over EBS, which is a form of network file system — ugh! And nearly 10x faster than RDS where most default RAG storage goes with a vanilla AWS setup.

You do not have to deeply understand networked file systems to know that a 10x speedup with cost savings is desirable. The rest of this article describes both our benchmarking and the specific configuration choices we made (proxmox over nix, shared team development VM with shared CLAUDE.md files and context), but without knowing any of that the decision to go to baremetal is the important one. AI not only changes the demand landscape, it reduces the cost of complexity management. It is time for bare metal to shine.

The Before Picture

GOLDA QUOTE — What were you running on (DigitalOcean, scattered services), what was the monthly bill, what was the pain point.
PETER QUOTE — What did the DigitalOcean setup look like from an engineering perspective? How many droplets/managed DBs? What was annoying about managing it?

Why Bare Metal, Why Now

GOLDA — What made you pull the trigger on bare metal vs just optimizing the cloud setup. The cost story, the control story, the AI-era story (pgvector, embeddings, Claude Code — these workloads need real hardware).

The Setup

We run a single dedicated server from Rackdog, a bare metal hosting provider based in the US. Intel Xeon Gold 6142, 128GB RAM, Samsung PM983 1.92TB NVMe. Proxmox VE as hypervisor. Four VM roles:

Architecture diagram: Proxmox host with Postgres, Dev, Platform, and App VMs on internal bridge, NVMe passthrough to Postgres
VMPurposeRAMCores
Postgres (100)Shared PG 15 + pgvector, NVMe passthrough32GB8
Dev (200)Team sandbox, Claude Code16GB8
Platform (300)Static frontends (Vercel replacement)8GB4
App VMs (400+)Per-app isolation, various stacks~68GB availvaries
RACKDOG QUOTE (Brian Fair) — Optional. Something about the kind of workloads they see customers running on bare metal, or why NVMe passthrough is underused, or the value prop vs cloud. Ask Brian.

Key Decision: NVMe Passthrough

PETER QUOTE — Why PCIe passthrough instead of letting Proxmox manage the NVMe as shared storage. What's the tradeoff. How IOMMU setup works in practice.

The Postgres VM gets direct PCIe access to the physical NVMe drive — no hypervisor storage layer in between. This is the single decision that drives most of the performance numbers below.

Key Decision: Postgres Tuning for NVMe

shared_buffers = 8GB
effective_cache_size = 24GB
work_mem = 64MB
maintenance_work_mem = 2GB
random_page_cost = 1.1        # NVMe: random reads nearly as fast as sequential
effective_io_concurrency = 200  # NVMe handles massive parallelism
PETER QUOTE — What tuning mattered most? What did you learn about PG on NVMe vs cloud EBS?

The Benchmarks

All benchmarks run against the Postgres VM — 32GB RAM, 8 cores, with direct NVMe passthrough. That's it. Not the full 128GB box, just one VM on it. The client ran from a separate VM on the same host over the internal bridge (sub-millisecond hop). PostgreSQL 15.15, pgvector 0.8.1. Every data point is a 30-second sustained run.

Standard PostgreSQL: pgbench Under Load

Scale factor 100 — 10 million rows, ~1.6GB working set. We swept from 1 to 64 concurrent clients to find the saturation point.

pgbench throughput and latency under load, 1 to 64 concurrent clients

Read-Write (TPC-B)

ClientsTPSAvg Latency
15321.88 ms
21,1191.79 ms
43,0361.32 ms
84,4371.80 ms
166,3662.51 ms
326,6114.84 ms
486,4107.49 ms
646,13910.43 ms

Near-linear scaling up to core count, then a clean plateau. Peak: 6,611 TPS at 32 clients with latency under 5ms. That matches what AWS needs a 16 vCPU / 128GB instance to achieve.

Read-Only

ClientsTPSAvg Latency
13,3460.30 ms
419,1110.21 ms
1641,8480.38 ms
3240,5580.79 ms
6439,5871.62 ms

42K TPS. Sub-millisecond latency up to 16 clients. Even at 64 concurrent connections: 1.6ms.

GOLDA OR PETER — One line reaction to these numbers.

Vector Search: pgvector with HNSW

100,000 vectors at 1,536 dimensions (OpenAI embedding size). HNSW index with m=16, ef_construction=64.

HNSW index build time: bare metal 2:56 vs Alibaba Cloud RDS 16:00

Index Build: 2 Minutes 56 Seconds

Alibaba Cloud RDS with 16 cores and 128GB takes 16 minutes for the same workload. 5.5x faster on half the hardware. Index builds are I/O-bound — NVMe passthrough dominates here.

Search Throughput Under Load

pgvector HNSW search throughput and latency under load, 1 to 64 concurrent clients
ClientsQPSAvg Latency
13183.1 ms
26203.2 ms
41,0084.0 ms
81,0207.8 ms
161,04515.3 ms
321,02031.4 ms
6492868.9 ms

Sweet spot: 4–8 clients. Over 1,000 searches per second at under 8ms latency. Throughput holds above 900 QPS even at 64 concurrent clients.

pgvector search throughput comparison: bare metal vs Supabase and Alibaba Cloud

How That Compares

EnvironmentvCPUsRAMQPSLatency
Our bare metal832GB1,0107.9 ms
Supabase 4XL32128GB95021 ms
Supabase XL832GB36055 ms
Supabase Medium28GB24083 ms
Alibaba Cloud RDS16128GB10216 ms
Vector search latency comparison: bare metal 7.9ms vs cloud 16-83ms

4x the throughput of Supabase at the same hardware class. 10x faster than Alibaba Cloud RDS at double our spec.

PETER QUOTE — Why pgvector on NVMe matters specifically for AI workloads. What the team actually uses vector search for.

The Cost

Monthly cost comparison: bare metal $355 vs AWS equivalent capacity ~$1,177

$355/month flat from Rackdog. No per-IOPS charges. No egress fees. No surprise bills. This single box replaces what would be 10+ cloud instances, a managed database, load balancers, and storage volumes across AWS, DigitalOcean, or Linode.

GOLDA QUOTE — The business case. What you do with the savings. How this changes what's possible for a small team.

Don't Need a Whole Box? $50/month.

Not every team needs 128GB of dedicated hardware. For $50/month, you get your own isolated VM on our infrastructure with direct access to the NVMe-backed Postgres — the same setup that produced these benchmarks. That includes:

Shared VM ($50/mo)AWS Equivalent
ComputeDedicated VM, isolatedt3.small EC2: $15/mo
DatabaseNVMe Postgres + pgvectorRDS db.t3.micro: $13/mo (slow)
StorageIncluded (NVMe-backed)gp3 EBS: $12/mo
Vector search1,000+ QPS at 8msNot available at this price
Claude CodeDirect SSH access, we help you set upYou figure it out
EgressNone$0.09/GB
Total$50/mo$40-80/mo minimum

The cloud equivalent costs about the same — but runs on network-attached storage at a fraction of the performance. You're getting database performance that Supabase charges $400/month for.

And you get Claude Code access directly on the VM. We'll show you the workflow — how to use AI-assisted development against a real database on real hardware, not a sandbox with artificial limits. Modify your app, run migrations, test against production-class Postgres, all from the command line.

GOLDA — More on the shared VM pitch. What kind of customer is this for? Indie devs? Small startups? AI projects that need vector search?

What We Learned

PETER — Hardest part of the migration from DigitalOcean. What wasn't in any docs. What would you do differently.
PETER — What would you tell a team that's nervous about giving up managed services like RDS?
GOLDA — What surprised you most once it was running?

Who This Is For

GOLDA — The profile of who should seriously consider this. Not everyone — who specifically.

We Can Help

GOLDA — CTA paragraph. LinkedTrust does these migrations. Link to linkedtrust.us/services/baremetal-migration/

Methodology

All benchmarks ran on 2026-02-21. pgbench scale 100 (10M rows), 30-second runs per data point. pgvector: 100K random vectors at 1536 dimensions, HNSW m=16 ef_construction=64, ef_search=40, cosine distance. Client on Dev VM (200), server on Postgres VM (100), same physical host, internal bridge network.

All scripts, configs, benchmark runners, and operational docs are open source: github.com/Cooperation-org/barebox

Sources