Building a production-grade multi-tenant AI chatbot with FastAPI, Ollama, ChromaDB, Celery, and Redis.
client_id as the fundamental unit, config-driven behavior, the DB schema for clients and agent_modes, and why the system prompt must be a runtime artifact not a…
Loading agent modes per client, composing tone + personality + RAG + capability fragments, defending against prompt injection in admin-supplied fragments, and m…
One ChromaDB collection per tenant for strict isolation, the document ingestion pipeline (PDF/DOCX to chunks to embeddings), query-time retrieval, and increment…
Running Llama 3.1 locally with Ollama, OpenAI-compatible SDK integration, prompt engineering for sales contexts, and latency management with streaming responses…
WhatsApp's 20-second webhook timeout forces async architecture: acknowledge immediately, process in Celery, retry on failure, and route dead tasks for inspectio…
The channel adapter pattern isolates WhatsApp, widget, and mobile channel handling from the shared intelligence core. Same LLM, same RAG, different inbound pars…
Agent modes are database-configured feature flags for AI capabilities. Activating lead capture or appointment setting from an admin dashboard updates the system…
Linking WhatsApp conversations to CRM contacts, LLM-powered lead field extraction, pushing behavioral scores as CRM custom fields, and scheduling follow-ups fro…