Hardening the Local AI Stack

// the problem

Convenient defaults are open doors

Every guide in this series so far optimised for getting something working. docker run, expose a port, move on. That is exactly the right instinct when you're learning — and exactly the wrong state to leave a machine in once it does real work.

The stack as built has four soft spots. Open WebUI and SearXNG listen on plain HTTP, so anything on the network can read the traffic. The containers run as root by default, so a single container escape is a root shell on the host. Nothing watches the logs, so a brute-force attempt against the WebUI login looks identical to normal use. And the AI containers can reach the entire LAN and the open internet, so a compromised image can quietly exfiltrate whatever it indexed.

None of this requires a rewrite. It's five focused changes, each one a self-contained step, each one leaving the stack measurably harder to attack than the step before.

BEFORE → open ports, root containers, no TLS, no monitoring, flat network ↓ ┌──────────────────────────────────────────┐ │ 01 Caddy reverse proxy → TLS + security headers │ │ 02 CrowdSec → log-driven IP bans │ │ 03 container hardening → non-root, caps dropped │ │ 04 Trivy image scans → known CVEs surfaced │ │ 05 nftables + Docker nets → egress + LAN locked down │ └──────────────────────────────────────────┘ ↓ AFTER → one TLS endpoint, least-privilege containers, monitored, segmented

// step 01 — reverse proxy

One front door, with TLS and headers

Right now each service has its own exposed port. The fix is to stop exposing them directly and put a single reverse proxy in front. Caddy is the easy choice here: it does automatic HTTPS, the config is three lines per service, and security headers are trivial to set. Internally the WebUI and SearXNG ports stay on the Docker network and are no longer published to the host at all.

01 / a — caddyfile

~/ai-stack/Caddyfile

# local TLS via an internal CA — trust it once with `caddy trust`
ai.local {
    tls internal

    # security headers applied to every response
    header {
        Strict-Transport-Security "max-age=31536000; includeSubDomains"
        X-Content-Type-Options    "nosniff"
        X-Frame-Options           "DENY"
        Referrer-Policy           "no-referrer"
        Content-Security-Policy   "default-src 'self'; img-src 'self' data:; style-src 'self' 'unsafe-inline'"
        -Server
    }

    # rate-limit the login surface (needs the rate_limit plugin)
    @login path /api/v1/auths/signin
    rate_limit @login {
        zone login { key {remote_host} events 5 window 1m }
    }

    reverse_proxy open-webui:8080
}

search.ai.local {
    tls internal
    reverse_proxy searxng:8080
}

01 / b — compose

Add Caddy to the compose file as the only service that publishes ports. Note what's missing from the WebUI and SearXNG definitions now: no ports: block. They're reachable only over the internal Docker network, through Caddy.

~/ai-stack/docker-compose.yml

caddy:
  image: caddy:2-alpine
  container_name: caddy
  restart: unless-stopped
  ports:
    - "443:443"
    - "80:80"
  volumes:
    - ./Caddyfile:/etc/caddy/Caddyfile:ro
    - caddy_data:/data
    - caddy_config:/config
  networks: [frontend, backend]

open-webui:
  # ... existing config, but DELETE the `ports:` block ...
  networks: [backend]

searxng:
  # ... existing config, DELETE `ports:` here too ...
  networks: [backend]

// why this is the first move

Collapsing every service down to one TLS endpoint shrinks the attack surface before anything else is done. There is now exactly one process listening to the outside world, it speaks HTTPS, it advertises nothing about itself, and it's the natural place to bolt on rate limiting and, next, intrusion detection.

// step 02 — detection

CrowdSec — logs that fight back

A reverse proxy logs every request, which means it also logs every probe: the credential-stuffing run against the login, the scanner walking common paths, the bot hammering the search endpoint. CrowdSec reads those logs, matches them against community-maintained scenarios, and bans the offending IPs at the firewall — think fail2ban, but with shared threat intelligence so you benefit from attacks seen on other machines.

02 / a — deploy

~/ai-stack/docker-compose.yml

crowdsec:
  image: crowdsecurity/crowdsec:latest
  container_name: crowdsec
  restart: unless-stopped
  environment:
    COLLECTIONS: "crowdsecurity/caddy crowdsecurity/http-cve"
  volumes:
    - ./caddy/logs:/var/log/caddy:ro
    - crowdsec_db:/var/lib/crowdsec/data
    - crowdsec_config:/etc/crowdsec
  networks: [backend]

02 / b — verify it's parsing

bash

$ docker exec crowdsec cscli metrics
$ docker exec crowdsec cscli decisions list   # active bans
$ docker exec crowdsec cscli alerts list       # what's been seen

The Caddy bouncer then enforces those decisions: a banned IP gets a 403 before the request ever reaches the WebUI. The first time you see a real scanner picked up and dropped, the value lands immediately.

// reality check

On a home network behind NAT this matters most if you ever port-forward the stack or reach it over a VPN/Tailscale tunnel. If it never leaves the LAN, CrowdSec is more of a learning exercise than a necessity — worth doing precisely because the day you do expose it, the muscle memory is already there.

// step 03 — container hardening

Least privilege, by default

This is the highest-leverage step and the one most guides skip. By default a Docker container runs as root, keeps a fat set of Linux capabilities, and has a writable root filesystem. A container escape under those conditions is a root compromise of the host. Closing that gap is just a few lines per service.

user: non-root Run the process as an unprivileged UID. An escape lands you as a nobody user, not root.
cap_drop: ALL Strip every Linux capability, then add back only what the service genuinely needs — usually nothing.
read_only: true Mount the root filesystem read-only. Give writable tmpfs only where the app actually writes.
no-new-privileges Block any setuid binary from escalating privilege inside the container.
seccomp + pids-limit Keep Docker's default seccomp profile on, and cap process count to blunt fork-bomb style abuse.

03 / applied to a service

~/ai-stack/docker-compose.yml

searxng:
  image: searxng/searxng:latest
  container_name: searxng
  restart: unless-stopped
  user: "977:977"
  read_only: true
  security_opt:
    - no-new-privileges:true
    - seccomp=default
  cap_drop: [ALL]
  cap_add: [CHOWN, SETGID, SETUID]   # searxng's documented minimum
  pids_limit: 256
  tmpfs:
    - /tmp
  volumes:
    - ./searxng:/etc/searxng:ro
  networks: [backend]

Apply the same treatment to Open WebUI and the indexer. Ollama is the one to test carefully — GPU passthrough needs specific device access, so harden it last and confirm nvidia-smi still works inside the container after each change.

// the test that proves it worked

Exec into a hardened container and try to write outside the allowed paths: docker exec searxng touch /etc/test should fail with a read-only error, and docker exec searxng id should report a non-zero UID. If both behave, the container is no longer a soft path to root.

// step 04 — image scanning

Know what's in your images

Every image you pull is a stack of someone else's software, and some of it has known vulnerabilities. Trivy scans an image against CVE databases and tells you exactly what's inside — OS packages, language libraries, the lot — ranked by severity. Run it before you trust an image, and again on a schedule, because new CVEs are disclosed against images you pulled months ago.

04 / a — scan an image

bash

$ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
    aquasec/trivy:latest image \
    --severity HIGH,CRITICAL \
    open-webui/open-webui:latest

04 / b — what good looks like

The win isn't zero findings — it's fewer, and known. Pinning to a slim or distroless base image and rebuilding regularly is what moves the numbers. A representative before/after on the indexer image:

Image	Critical	High	Base
indexer (python:3.12)	3	29	full Debian
indexer (python:3.12-slim)	0	6	slim

Those are illustrative numbers — the point is the shape of the result. Swapping one base image line removed every critical finding and cut highs by roughly 80%, screenshot-ready for a write-up.

// make it routine

Wire a Trivy scan into a weekly cron or systemd timer that writes a dated report into the same documents folder the RAG indexer already watches. The security posture of the stack then becomes searchable by the stack itself — a neat closing of the loop with the existing Watchdog pipeline.

// step 05 — segmentation

Stop the containers reaching what they shouldn't

By default Docker containers can talk to each other freely and reach the open internet and the rest of your LAN. For an AI stack that ingests documents, that egress path is the real risk: a compromised or malicious image can quietly ship whatever it indexed somewhere else. Two layers fix it — split Docker networks so services only reach their legitimate neighbours, and a host firewall that controls what leaves the box.

05 / a — split the networks

~/ai-stack/docker-compose.yml

networks:
  frontend:            # only Caddy sits here, faces the host
    driver: bridge
  backend:             # app services talk to each other here only
    driver: bridge
    internal: true   # no route to the outside world

Marking the backend network internal: true means SearXNG and the indexer physically cannot reach the internet except through services you explicitly bridge. SearXNG, which legitimately needs outbound web access, gets its own controlled path; the document indexer, which never should, gets none.

05 / b — host firewall

bash — ufw

$ sudo ufw default deny incoming
$ sudo ufw default allow outgoing
$ sudo ufw allow 443/tcp          # the Caddy front door, nothing else
$ sudo ufw allow from 192.168.1.0/24 to any port 22  # SSH, LAN only
$ sudo ufw enable
$ sudo ufw status verbose

For finer control over Docker's own egress — Docker writes its own iptables rules and can bypass ufw — an nftables ruleset on the DOCKER-USER chain lets you allow the host LAN for management while denying the containers a route to it. That's the rule that turns "the AI box can see every device in the house" into "the AI box can see exactly the internet endpoints it needs and nothing on the LAN."

// what this demonstrates

The same machine, now defensible

None of this changed what the stack does. It still runs local models, still searches the web and your own documents, still costs nothing per query. What changed is the blast radius. Traffic is encrypted. The login is rate-limited and watched. A container escape lands an attacker as a powerless user in a read-only box with no path to the LAN. The images are inventoried against known CVEs on a schedule. And the one service that ingests untrusted documents has no way to phone home.

That last point is the bridge to the next piece. Hardening the infrastructure closes the obvious doors — but an AI stack has a stranger attack surface than a normal web app, because the documents it reads can carry instructions. Part 2 red-teams the RAG pipeline itself: what happens when the threat isn't a port or a CVE, but a sentence buried in a file the indexer just swallowed.

Hardening theLocal AI Stack

Convenient defaults are open doors

One front door, with TLS and headers

CrowdSec — logs that fight back

Least privilege, by default

Know what's in your images

Stop the containers reaching what they shouldn't

The same machine, now defensible

Hardening the
Local AI Stack