📄
Part 1 article
Designing a Multi-Tenant AI Platform from Scratch

client_id as the fundamental unit, config-driven behavior, the DB schema for clients and agent_modes, and why the system prompt must be a runtime artifact not a…

📄
Part 2 article
Dynamic System Prompt Construction

Loading agent modes per client, composing tone + personality + RAG + capability fragments, defending against prompt injection in admin-supplied fragments, and m…

📄
Part 3 article
RAG Per Client with ChromaDB

One ChromaDB collection per tenant for strict isolation, the document ingestion pipeline (PDF/DOCX to chunks to embeddings), query-time retrieval, and increment…

📄
Part 4 article
Ollama Local LLM Integration

Running Llama 3.1 locally with Ollama, OpenAI-compatible SDK integration, prompt engineering for sales contexts, and latency management with streaming responses…

📄
Part 5 article
Celery + Redis Task Queue for AI

WhatsApp's 20-second webhook timeout forces async architecture: acknowledge immediately, process in Celery, retry on failure, and route dead tasks for inspectio…

📄
Part 6 article
Multi-Channel Adapters — WhatsApp, Widget, Mobile

The channel adapter pattern isolates WhatsApp, widget, and mobile channel handling from the shared intelligence core. Same LLM, same RAG, different inbound pars…

📄
Part 7 article
Agent Mode System — Activating Capabilities Per Client

Agent modes are database-configured feature flags for AI capabilities. Activating lead capture or appointment setting from an admin dashboard updates the system…

📄
Part 8 article
CRM Integration — Conversations to Pipeline

Linking WhatsApp conversations to CRM contacts, LLM-powered lead field extraction, pushing behavioral scores as CRM custom fields, and scheduling follow-ups fro…

← All Series