Polaris & Hermes: Two Intel NUCs, One Personal Multi-Agent Setup

The Setup

I've been obsessed with the idea of a personal AI agent for a while. Not a chatbot. An agent. Something that reads my email, checks my calendar, sends a WhatsApp when I ask it to, and actually gets things done without me holding its hand.

The OpenClaw project pushed me off the fence. I stopped reading about other people's setups and started building one.

Two Intel NUCs that had been sitting on my desk doing general homelab work got repurposed. Different generations, different specs, very different jobs.

The Hardware

NUC13ANHi7 — Arena Canyon, 2023

Core i7-1360P — 12 cores (4P + 8E), 16 threads, up to 5.0 GHz
Iris Xe iGPU (96 EUs)
64GB DDR4-3200 dual-channel (2× 32GB matched pair)
480GB WD Green SATA SSD

NUC8i5BEH — Bean Canyon, 2018

Core i5-8259U — 4 cores, 8 threads, up to 3.8 GHz
Iris Plus 655 iGPU
24GB DDR4-2400 (mixed sticks, downclocked)
250GB NVMe (Gen 3) + 240GB SATA SSD

The NUC13 is roughly 3× the CPU and 2× the iGPU compute of the NUC8. That's a generational gap, not a marginal one. Once I accepted that, the split wrote itself.

Polaris Does the Thinking. Hermes Runs the Agents.

The names came together once the roles were clear. Polaris — the north star — is the fixed reference point, the thing everything else orients around. Hermes — the messenger — is the one that actually goes out and does things.

Having clean SSH targets (ssh polaris, ssh hermes) makes daily ops feel less like wrestling with IP addresses.

Polaris (NUC13) — Inference

This is where local LLMs live. Ollama serving:

Qwen 2.5 7B Q4 — fast classifier, the workhorse
Llama 3.1 8B Q4 — fallback for privacy-sensitive tasks
nomic-embed-text — embeddings for RAG over my Obsidian notes

LLM inference is memory-bandwidth-bound, not clock-speed-bound. That's why the DDR4-3200 dual-channel matters more than raw GHz here. The 64GB ceiling also means I can experiment with 14B models or load multiple smaller ones simultaneously without thrashing.

Hermes (NUC8) — Orchestration

This is where the OpenClaw agents run. They're Node.js processes — lightweight, mostly I/O-bound, spending most of their time waiting on LLM responses or external APIs. Four cores and 24GB are plenty for 5–8 concurrent agents.

The dual drives on Hermes are a bonus: fast NVMe for OS and active state, SATA for bulk storage. The same separation logic that works well for Polaris — once I actually fix Polaris's storage situation.

Why Not the Other Way Around

I went back and forth on this. The instinct is to put your most demanding workload on your best hardware. But "most demanding" depends on what's actually demanding, and that's where most people get this wrong.

LLM inference is greedy: CPU, RAM bandwidth, iGPU. The NUC13 wins on all three.

Agent orchestration is mostly waiting. An agent fires a tool call, waits for an API response, fires another, waits again. CPU usage spikes for a moment, then sits at idle. Hermes's four-core, older chip is completely adequate for that workload profile.

Flipping the assignment would have meant unusably slow inference on the NUC8 and wasted hardware running I/O-bound Node.js processes on the NUC13. The split isn't clever — it's just matching the machine to what it's good at.

The Bottleneck I'm Not Glossing Over

Polaris is currently running on a 480GB WD Green SATA SSD. The NUC13ANHi7 has two M.2 NVMe slots — one PCIe 4.0 x4, one PCIe 3.0 x4 — and I'm using neither of them.

The WD Green is WD's budget tier. DRAM-less, QLC NAND, peaks around 545 MB/s. It's fine as a boot drive. It is completely the wrong drive for LLM workloads.

For steady-state inference this doesn't matter — once a model is loaded into RAM, disk speed is irrelevant. But the moment you want to switch models, cold-start after a reboot, or load something you haven't used in a while, SATA becomes painful:

Operation	SATA SSD	NVMe Gen 4
Load a 7B model (~4.7 GB)	~9s	~1s
Load a 14B model (~9 GB)	~17s	~2s
Load Mixtral 8x7B (~26 GB)	~50s	~5–6s

That's the difference between an agent that feels responsive when it switches modes and one that makes you sit through a loading screen. Worse, if I ever exceed RAM and the OS starts swapping to SATA, the machine becomes molasses.

The fix is obvious: a 1TB Gen 4 NVMe for model storage, the existing SATA SSD demoted to logs and backups. Same dual-drive logic that's already working on Hermes. I'll do this before scaling beyond 2–3 agents — it's a known upgrade, not a surprise.

The honest version: the split between machines is the right call. The storage choice on Polaris isn't, and I knew it when I started. Sometimes you ship with what you have, then fix the bottleneck once you can measure it.

On Local-Only Inference

Pure local inference for agentic work is romantic but slow. Qwen 14B running at 8 tokens/sec isn't competitive with Claude Sonnet via API for reasoning-heavy tool use. So the real architecture is hybrid:

Claude (subscription) — primary brain, anything that needs actual reasoning
Perplexity Pro via MCP — search-grounded research
Local Polaris models — routing, classification, embeddings, privacy-sensitive tasks, fallback when API limits hit

The local models earn their keep by handling volume — heartbeats, email importance classification, embedding generation — so my Claude quota stays available for the work that actually needs it. This is the part most "run everything local!" advice skips over.

What I'm Building Toward

Ubuntu Server 24.04 LTS on both nodes — headless, SSH only
Tailscale mesh so I can reach either machine from anywhere
Ollama on Polaris, OpenClaw on Hermes in Docker
One agent first — morning briefing that pulls calendar, email, and LeetCode streak
Scale to 5–6 specialised agents once routing is solid

I'll write the next post once the first agent is actually running and I have real data on what works and what doesn't. There's enough breathless "I built JARVIS in a weekend" content on the internet. I'd rather write the version that includes the boring parts — the model that hallucinates tool arguments, the agent that fires too often, the rate limits that catch you in week two.

If you're thinking about a similar setup: don't buy faster hardware before you've matched workload to the hardware you already own. The split between Polaris and Hermes is doing more work than any upgrade I could have made.

More soon, once they start earning their names.