The last post I wrote about Polaris ended with a line about the role shifting — embeddings, batch work, fallback inference. That was the plan. What actually happened was more interesting.
Polaris is now a server. A proper one. And it happened in a single afternoon.
The Trigger
I wrote in the empty-quadrant post that "fast and free" isn't a quadrant that exists. What I underestimated is how quickly that conclusion leads somewhere else: if Polaris isn't earning its keep as an interactive LLM box, what is it good at?
The answer turned out to be: running things. Reliably. 24/7. Without me thinking about it.
The shift from "LLM machine" to "personal server" isn't a dramatic pivot — it's the same hardware, the same Ubuntu install, the same SSH session. What changed was what I decided to run on it.
What's Running Now
The Public Layer
Polaris sits behind Cloudflare. All external traffic hits Cloudflare first, then gets forwarded to the NUC at home. The NUC runs nginx as a reverse proxy — a single entry point that routes incoming requests to whatever's actually serving them internally.
The routing is project-based. If I'm running a todo API, it lives at api.singhangad.in/todo/. A reminder service gets /reminder/. Each new project is a new location block in one config file. Nginx reloads in under a second. No downtime, no drama.
SSL is handled by a Cloudflare Origin Certificate — a cert issued specifically for the tunnel between Cloudflare and my server. The browser talks to Cloudflare over a valid public cert. Cloudflare talks to my NUC over the Origin cert. End-to-end encrypted, and I never had to run Certbot or think about renewal.
The Private Layer
Not everything should be public. Monitoring dashboards, container management, internal tooling — these don't need to be on the internet. They need to be reachable by me, from anywhere.
That's what WireGuard is for. My Mac and phone both connect to Polaris over an encrypted VPN tunnel. Once connected, I can reach anything running on the server as if I'm sitting next to it. Internal services never touch the public internet. They don't need to.
The split is clean: public APIs go through nginx and Cloudflare. Private tooling goes through the VPN tunnel. Nothing is accidentally exposed.
Observability
A monitoring dashboard runs on the NUC and gives me real-time metrics — CPU, RAM, disk I/O, network, Docker containers, nginx request rates. It's accessible via VPN only. I can pull it up from my phone in ten seconds if I want to know why something feels slow.
An uptime monitor watches all my public endpoints and alerts me if something goes down. It's running in Docker, also VPN-only. The monitoring stack watches itself watch everything else.
Containers
Everything non-system runs in Docker. Each service gets its own compose file in a dedicated directory. Starting or stopping a service is docker compose up -d or docker compose down. No global state, no conflicting dependencies.
A container management UI runs on the NUC — VPN-only. I can restart a container from my phone without opening a terminal.
CI/CD
Polaris runs CI runners. When I push to a repo, a runner picks it up, builds it, and deploys it automatically. The NUC is both the build machine and the deployment target. The pipeline is: push code → runner builds → service restarts with new version. For personal projects, that's fast enough to feel instant.
Automation
The most interesting thing running on Polaris is a Python script that I think of as the weekly audit agent. On a regular schedule, it:
- Reads the week's system logs
- Sends them to the Claude API for analysis
- Emails me a summary — overall security status, key findings, top three recommended actions
The Claude API call costs roughly a cent. The output is the kind of thing a sysadmin would write after reviewing the logs manually. I read it over coffee on Sunday mornings.
The roles are clean: Claude analyzes, Polaris executes.
What I Got Wrong the First Time
When I set Polaris up as an LLM machine, I optimized for one thing: inference speed. The server role is different — the thing that matters is reliability, not throughput. A machine that serves APIs has to be running when the request arrives. A machine that runs inference jobs can start when you ask it to.
That sounds obvious. It took me longer than it should have to actually internalize it.
The other thing I underestimated was how much surface area a public-facing server has. An LLM machine that you SSH into occasionally has a small attack surface. A server with ports open to the internet is a different story. The day after I opened ports 80 and 443, bots were probing endpoints within minutes. That's not paranoia — I could see it in the nginx error logs.
The defensive stack I ended up with is layered:
- Cloudflare absorbs a lot of the internet before it reaches my network
- nginx rate-limits at the application layer
- The firewall restricts which ports accept traffic and from where
- Automated security audits run weekly and report anything worth knowing
- Private services are VPN-only and never touch the public internet
None of this is complicated. All of it is necessary.
What's Next
Hermes — the NUC8 — is still on the list. The plan from the last post stands: OpenClaw agents running on Hermes, Polaris as the API and compute backend. That's the next post once it's actually built.
For now, Polaris is doing its job. APIs are serving. Containers are running. The Sunday audit lands in my inbox. The monitoring dashboard is one VPN connection away.
Not bad for a machine I almost turned into a dedicated embeddings box.