Architecture of a Self-Hosted AI Stack

Series 7 — Self-Hosted AI Stack on Raspberry Pi • Part 1 of 6

A Raspberry Pi 5 running Kokoro TTS, Whisper STT, Ollama LLM, ChromaDB, and a custom audio converter handles the entire the WhatsApp AI agent stack without a single cloud API call. This article covers the service architecture, port planning, resource management, and startup order.

Services and Ports

┌─────────────────────────────────────────────────────────────────────────────┐
│                    RASPBERRY PI 5 (16GB RAM)                                │
│                                                                             │
│  Service               Port    Language    Process manager                  │
│  ─────────────────────────────────────────────────────────                 │
│  Kokoro TTS            9010    Python      nohup + disown                   │
│  Whisper STT           9011    Python      nohup + disown                   │
│  Audio Converter       9012    Python      nohup + disown                   │
│  Ollama LLM           11434    Go          systemd (built-in)               │
│  ChromaDB              8000    Python      nohup + disown                   │
│  Apache (PHP app)       443    PHP/Apache  systemd                          │
│  Nginx (reverse proxy)  80     Nginx       systemd                          │
└─────────────────────────────────────────────────────────────────────────────┘

Resource Constraints on Pi 5 (16GB)

Service	RAM (steady)	RAM (peak)	CPU (idle)
Kokoro TTS (af_heart model)	~800MB	~1.2GB	~2%
Whisper STT (base model)	~300MB	~600MB	~1%
Ollama (llama3.1:8b)	~5.5GB	~6.5GB	~3%
ChromaDB	~200MB	~500MB	~1%
Audio Converter	~50MB	~200MB	~0%
Apache + PHP	~400MB	~800MB	~5%

Total steady-state: ~7.25GB. Peak (all services under load simultaneously): ~9.8GB. Leaves 6GB headroom on a 16GB Pi.

Service Startup Order

Services with inter-dependencies must start in order:

ChromaDB — no dependencies
Ollama — no dependencies (managed by systemd)
Kokoro TTS — no dependencies
Whisper STT — no dependencies
Audio Converter — depends on FFmpeg being installed
Apache/PHP — depends on all above (waits for health checks)

Create a startup script that checks each service's health endpoint before starting the next. A service that starts before its dependencies produces cryptic errors.

What to Watch For

SD card failure — Raspberry Pi running AI services writes frequently to disk (model caches, temp files). Use an SSD via USB 3.0 for the OS and services, not the SD card.
Temperature throttling — Pi 5 throttles at 80°C. Under sustained LLM load, CPU temperature can reach 70°C without active cooling. Add a heatsink and fan.
Port conflict detection — Before starting any service, check if the port is already bound: ss -tlnp | grep :9010. A zombie process holding the port will cause a silent startup failure.

Services and Ports

Resource Constraints on Pi 5 (16GB)

Service Startup Order

What to Watch For

Stay at the cutting edge