Nginx sits in front of all AI microservices as a reverse proxy. It handles TLS termination, routes requests to the correct backend service, authenticates at the proxy layer, and rate-limits by service. This article covers the complete Nginx configuration for a self-hosted AI stack.
Routing to AI Services
upstream kokoro_tts { server 127.0.0.1:9010; }
upstream whisper_stt { server 127.0.0.1:9011; }
upstream audio_conv { server 127.0.0.1:9012; }
upstream ollama_llm { server 127.0.0.1:11434; }
upstream chromadb { server 127.0.0.1:8000; }
server {
listen 443 ssl;
server_name ai.govindpreetsingh.com;
ssl_certificate /etc/letsencrypt/live/ai.govindpreetsingh.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/ai.govindpreetsingh.com/privkey.pem;
# Internal API key auth at proxy layer
location /tts/ {
auth_request /auth;
proxy_pass http://kokoro_tts/;
limit_req zone=tts_zone burst=5 nodelay;
}
location /stt/ {
auth_request /auth;
proxy_pass http://whisper_stt/;
limit_req zone=stt_zone burst=3 nodelay;
}
location /llm/ {
auth_request /auth;
proxy_pass http://ollama_llm/;
limit_req zone=llm_zone burst=2 nodelay;
proxy_read_timeout 60s; # LLM responses take longer
}
# Internal auth sub-request handler
location = /auth {
internal;
proxy_pass http://127.0.0.1:8080/internal/auth;
proxy_pass_request_body off;
proxy_set_header Content-Length "";
proxy_set_header X-API-Key $http_x_api_key;
}
}
Rate Limiting Per Service
http {
# Rate limit zones — per IP
limit_req_zone $binary_remote_addr zone=tts_zone:10m rate=10r/m;
limit_req_zone $binary_remote_addr zone=stt_zone:10m rate=6r/m;
limit_req_zone $binary_remote_addr zone=llm_zone:10m rate=4r/m;
# LLM gets the tightest limit — most resource-intensive
}
SSL Termination for Local Services
The AI microservices (Kokoro, Whisper, etc.) run HTTP only — no TLS. Nginx terminates TLS at the proxy and forwards plain HTTP internally. This is correct and standard: TLS at the proxy + plain HTTP to localhost is equivalent in security to end-to-end TLS when the path never leaves the machine.
What to Watch For
- proxy_read_timeout for LLM — Ollama can take 30-60 seconds for a long response. Default Nginx timeout is 60 seconds. Set it explicitly for the LLM upstream to avoid unexpected 504 errors.
- Client body size for audio upload — WhatsApp voice notes can be 4MB+. Set
client_max_body_size 10Mfor the STT and converter routes. - Rate limit storage — The
10minlimit_req_zoneis the memory zone size (10 megabytes), not the rate. 10MB handles ~160,000 unique IPs.