Building a multi-tenant AI chatbot means solving a different problem from building a single chatbot. Isolation, configuration, and predictable per-client behavior are the hard problems. This article walks through the design decisions that make the the chatbot platform platform work.
The Fundamental Unit: client_id
Everything in a multi-tenant AI platform scopes to a client. A client_id appears on every database row, every vector collection, every system prompt, and every conversation record. Without it, data leaks between tenants — silently and catastrophically.
CREATE TABLE clients (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
slug VARCHAR(80) NOT NULL UNIQUE,
name VARCHAR(255) NOT NULL,
timezone VARCHAR(50) NOT NULL DEFAULT 'UTC',
locale VARCHAR(10) NOT NULL DEFAULT 'en',
plan ENUM('starter','growth','enterprise') NOT NULL DEFAULT 'starter',
status ENUM('active','suspended','cancelled') NOT NULL DEFAULT 'active',
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- Every downstream table carries client_id as a FK + index
CREATE TABLE conversations (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
client_id INT UNSIGNED NOT NULL,
channel ENUM('whatsapp','widget','mobile','api') NOT NULL,
external_id VARCHAR(255) NOT NULL, -- wa_contact_id, widget_session_id, etc.
status ENUM('open','closed') NOT NULL DEFAULT 'open',
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX idx_client_channel (client_id, channel),
FOREIGN KEY (client_id) REFERENCES clients(id)
);
Config-Driven vs Code-Driven Behavior
The trap in multi-tenant AI is writing per-client logic in code: if ($client === 'acme') { … }. That approach does not scale past three clients. Instead, all behavioral variation lives in the database.
CREATE TABLE client_configs (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
client_id INT UNSIGNED NOT NULL UNIQUE,
persona_name VARCHAR(100) NOT NULL DEFAULT 'Assistant',
persona_tone ENUM('professional','friendly','formal','casual') NOT NULL DEFAULT 'professional',
persona_language VARCHAR(10) NOT NULL DEFAULT 'en',
llm_model VARCHAR(80) NOT NULL DEFAULT 'llama3.1:8b',
llm_temperature DECIMAL(3,2) NOT NULL DEFAULT 0.70,
max_context_turns TINYINT UNSIGNED NOT NULL DEFAULT 10,
rag_top_k TINYINT UNSIGNED NOT NULL DEFAULT 5,
human_handoff_phrase VARCHAR(200) DEFAULT NULL,
FOREIGN KEY (client_id) REFERENCES clients(id)
);
When a new conversation arrives, the platform reads one row from client_configs and everything — persona, model, temperature, context window size — is determined without touching code.
The System Prompt as a Runtime Artifact
The system prompt is not a file in the repository. It is assembled at runtime from database-stored fragments. This is what makes per-client personality, product knowledge, and capability activation possible without code changes.
def build_system_prompt(client_id: int, db) -> str:
cfg = db.query("SELECT * FROM client_configs WHERE client_id = ?", client_id)
modes = db.query(
"SELECT module_key, prompt_fragment FROM agent_modes "
"WHERE client_id = ? AND is_active = 1 ORDER BY priority", client_id
)
knowledge = fetch_rag_context(client_id, current_query) # top-k from ChromaDB
parts = [
f"You are {cfg['persona_name']}, an AI assistant for {cfg['client_name']}.",
f"Tone: {cfg['persona_tone']}. Language: {cfg['persona_language']}.",
"=== PRODUCT KNOWLEDGE ===",
knowledge,
"=== ACTIVE CAPABILITIES ===",
]
for mode in modes:
parts.append(mode['prompt_fragment'])
return "\n\n".join(parts)
What to Watch For
- Cross-tenant data leakage — Never fetch conversations or knowledge without a
client_idfilter. Add a DB layer check that rejects queries without it. - Prompt token overflow — System prompts grow as modes are added. Set a hard token budget and measure the assembled prompt before sending it to the LLM.
- Config cache staleness — Cache client configs aggressively, but clear the cache immediately on any config update. Stale configs mean a client's bot behaves differently than what the admin just configured.