Polaris Under Pressure: Benchmarking a 13th Gen NUC as a Production API Server

The previous post in this series set up the Nexus backend — Node.js, TypeScript, Express, Prisma, PostgreSQL, all Dockerized and running on Polaris. The API is live. Tests pass. The build is green.

The question I hadn't answered: what can this thing actually handle?

Polaris is an Intel NUC. A 13th Gen Core i7-1360P in a box the size of a paperback book, drawing about 15W at idle. It lives under my desk. I deploy real projects to it. But I'd never systematically tested how far it could be pushed.

So I built a proper load testing suite using k6 and put the Nexus API under progressively increasing pressure — while monitoring every CPU cycle, RAM byte, and network packet on the server in real time.

The Hardware


CPU	Intel Core i7-1360P (13th Gen, Raptor Lake)
Cores / Threads	12 cores (4P + 8E) · 16 threads
Boost	up to 5.0 GHz
RAM	64 GB DDR4 3200 MT/s (2× 32 GB)
Storage	480 GB WD Green SSD
OS	Ubuntu 26.04 LTS

The i7-1360P is a laptop-class chip — 28W TDP — which is what makes NUCs compelling as always-on servers. At the time of testing: 2.3 GB RAM used, 31 GB in buffer/page cache, server uptime 14+ hours. The API container and Postgres container both running on the same machine. No external database, no load balancer.

The API Under Test

Nexus Backend is the REST API for the Android app I described in the previous post — a real-time collaborative project management tool. The stack: Node.js 20, Express 4, Prisma 5, PostgreSQL 16, JWT auth, WebSocket server.

The endpoints cover everything a real app needs: auth with refresh token rotation, workspace and project management, task CRUD with priority and assignee filtering, subtasks, comments, file attachments, label management, full-text search, analytics aggregations, paginated list queries, and push notifications via FCM.

Both the API and the database ran on Polaris during all tests. One machine, two containers.

The Testing Methodology

k6 is a load testing tool where virtual users (VUs) run JavaScript test scripts in a loop. Each VU makes a sequence of real HTTP requests, pauses 1–3 seconds between iterations to simulate human pacing, and runs concurrently with all other VUs.

Each VU was assigned a behaviour bucket based on __VU % 10:

60% Readers — health check → workspace list → project detail → task list → task detail → notifications
30% Writers — create task → update priority → add comment → move to Done → delete task
10% Searchers — full-text search → analytics → paginated task list

Authentication happens once in setup(), not per VU. The resulting access token is shared across all virtual users — the way a real app works, where users stay logged in rather than logging in on every request. This also avoids exhausting the rate limit on auth endpoints (15 requests per 15 minutes).

I set hard thresholds before running any tests. A threshold is a pass/fail condition — if it's breached, the test exits with a non-zero code:

Metric	Threshold
`http_req_duration p(95)`	< 600 ms
`http_req_duration p(99)`	< 2000 ms
Hard HTTP failures	< 5%
Task list p(95)	< 400 ms
Task create p(95)	< 700 ms
Search p(95)	< 500 ms
Analytics p(95)	< 600 ms

Thresholds make the test honest. Without them, you're just looking at numbers and deciding afterward whether they're good.

Stress Test: 0 → 300 VUs

The stress scenario ramps linearly from zero to 300 concurrent users over ten minutes, holds at peak, then ramps back down. The goal is to find the degradation point — the VU count where latency climbs past thresholds or errors start appearing.

k6 output:

Total requests:    147,458
Hard failures:     0  (0.00%)
Check pass rate:   100%

http_req_duration  avg=106ms  p95=265ms  p99=356ms  max=1,271ms
task list p(95):   286ms  ✓
task create p(95): 314ms  ✓
search p(95):      176ms  ✓
analytics p(95):   266ms  ✓

Server at peak (300 VUs):

Host CPU:        ~40%
Container CPU:   ~551%
RAM delta:       +190 MB
Load average:    5.47
Disk I/O:        0
Network:         ↓2.0 Mbps  ↑8.0 Mbps

Every threshold passed. The p95 of 265ms is less than half the 600ms limit. 147,458 requests, zero failures.

The container CPU reading of 551% is not a typo. Docker reports per-CPU percentage — 551% means the Node process was using about 5.5 cores at peak. The machine has 16 logical cores. The remaining ~60% of CPU sat idle.

Recovery to baseline took about 90 seconds after the ramp-down. No stuck connections, no lingering queries.

The degradation point was never found. 300 VUs wasn't enough.

Spike Test: 5 → 400 VUs in 30 Seconds

The spike scenario tests something different from sustained load: sudden bursts. Five users jumping to 400 in 30 seconds, holding for one minute, then ramping back down. This is the scenario you care about when a post goes viral, or a push notification lands on a million devices simultaneously.

k6 output:

Total requests:    45,355
Hard failures:     0  (0.00%)
Check pass rate:   100%

http_req_duration  avg=310ms  p95=607ms  p99=715ms  max=2,250ms
task list p(95):   606ms  ✗  (threshold: 400ms)
task create p(95): 700ms  ✓  (at the limit)
search p(95):      353ms  ✓
analytics p(95):   568ms  ✓

Server at peak (400 VUs):

Host CPU:        ~46%
Container CPU:   ~628%
RAM delta:       +21 MB
Recovery:        ~90 seconds

Two latency thresholds breached. The global p95 went from 265ms at 300 VUs to 607ms at 400 VUs — a 2.3× jump for a 33% increase in VUs. That's a non-linear scaling curve, which is what you expect when a resource (in this case, the Postgres connection pool) starts to saturate.

What didn't happen is more interesting than what did: zero hard failures. No 5xx errors. No dropped connections. No timeout. Every single request got a response — it was just slower than the threshold allowed.

That's the distinction between graceful degradation and failure. The API slowed down. It didn't fall over.

Recovery matched the stress test exactly: 90 seconds from ramp-down to idle. The server behaved identically whether the load had been building gradually for ten minutes or had appeared in 30 seconds.

Soak Test: 20 VUs for 30 Minutes

The soak test isn't about finding the ceiling — it's about confirming there's no slow rot. Memory leaks, connection pool exhaustion, query degradation as data accumulates. Problems that don't show up in a ten-minute stress test but surface after thirty minutes of sustained load.

The scenario: 20 VUs, steady state, 30 minutes. JWT_ACCESS_EXPIRES_IN=2h so the shared token stays valid for the full run.

k6 output:

Total requests:    65,909
Hard failures:     0  (0.00%)
Check pass rate:   100%

http_req_duration  avg=29ms  p95=100ms  p99=117ms  max=420ms
task list p(95):   101ms  ✓
task create p(95): 105ms  ✓
search p(95):      96ms   ✓
analytics p(95):   40ms   ✓

Server during test:

Host CPU:        7–10%
Container CPU:   27–35%
RAM delta:       +83–113 MB
Disk I/O:        0
Network:         ↓200 kbps  ↑660 kbps

At 20 VUs, 30 minutes of load barely registers on Polaris. Container CPU held at 27–35% from the first sample to the last with zero upward drift. Host CPU stayed at 7–10%. The machine was doing something, but you'd never call it pressured.

The latency drift picture: p95 stayed between 96ms and 114ms across every five-minute bucket of the 30-minute window. The first five minutes look identical to the last five. No warmup ramp, no gradual climb, no sign that accumulated database rows or heap growth are affecting response times.

What the soak test confirms:

No memory leak. RAM delta over 30 minutes at 20 VUs: +83–113 MB net. That's the data written to Postgres during the test — 2,590 tasks and 2,590 comments — reflected in the database's shared buffers. The Node process heap didn't grow. Buffer/page cache absorbed the new rows; nothing leaked.

No query degradation. The task list query — the one that showed the first signs of pressure in the spike test — stayed flat the entire run. Adding 2,590 rows to the task table had no measurable effect on query time with the composite indexes in place.

Immediate recovery. Container CPU dropped from active to under 2% within 5 seconds of the test ending. Faster than either the stress or spike tests.

What the Numbers Say About the Hardware

Test	VUs	Requests	Failures	p95	Host CPU
Stress	0 → 300	147,458	0%	265 ms	40%
Spike	5 → 400 burst	45,355	0%	607 ms	46%
Soak	20 × 30 min	65,909	0%	100 ms	10%

At 300 VUs the server was using ~40% of available CPU. The degradation point — where p95 crosses 600ms — is somewhere around 600–800 concurrent users based on the linear projection. Hard failures probably don't appear until well past 1,000. And that's with the API and Postgres sharing the same machine.

A 13th Gen i7-1360P in a 28W NUC is serving ~188 HTTP requests per second at 300 concurrent users, with sub-300ms p95 latency, zero failures, and 60% of its CPU still idle.

What the Numbers Say About Node.js on a Single Machine

The spike test revealed something specific: the bottleneck when it came wasn't CPU or RAM — it was the task list query. GET /boards/:id/tasks does the most join work (boards, assignees, labels, subtask counts) and was the first endpoint to breach its threshold at 400 VUs.

The fix is a composite index:

CREATE INDEX "Task_boardId_createdAt_idx" ON "Task"("boardId", "createdAt" DESC);
CREATE INDEX "Task_boardId_assigneeId_idx" ON "Task"("boardId", "assigneeId");

Two index entries in schema.prisma, one schema push, no code changes. This is the pattern: the hardware has headroom, the bottleneck is almost always a specific query that needs an index, and it's a software fix.

The hardware is not the constraint. The NUC can take it.

What's Next

The backend is proven. Three test suites, 259,000+ requests, zero hard failures across all of them. The NUC has headroom.

The next post is the first real Android module — starting with the one that's most likely to go wrong if the architecture is off: offline-first sync.