Designing a Multi-Tenant AI Platform from Scratch

Series 4 — Multi-Tenant AI Chatbot Architecture • Part 1 of 8

Building a multi-tenant AI chatbot means solving a different problem from building a single chatbot. Isolation, configuration, and predictable per-client behavior are the hard problems. This article walks through the design decisions that make the the chatbot platform platform work.

The Fundamental Unit: client_id

Everything in a multi-tenant AI platform scopes to a client. A client_id appears on every database row, every vector collection, every system prompt, and every conversation record. Without it, data leaks between tenants — silently and catastrophically.

CREATE TABLE clients (
  id          INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  slug        VARCHAR(80) NOT NULL UNIQUE,
  name        VARCHAR(255) NOT NULL,
  timezone    VARCHAR(50)  NOT NULL DEFAULT 'UTC',
  locale      VARCHAR(10)  NOT NULL DEFAULT 'en',
  plan        ENUM('starter','growth','enterprise') NOT NULL DEFAULT 'starter',
  status      ENUM('active','suspended','cancelled') NOT NULL DEFAULT 'active',
  created_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

-- Every downstream table carries client_id as a FK + index
CREATE TABLE conversations (
  id          BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  client_id   INT UNSIGNED NOT NULL,
  channel     ENUM('whatsapp','widget','mobile','api') NOT NULL,
  external_id VARCHAR(255) NOT NULL,     -- wa_contact_id, widget_session_id, etc.
  status      ENUM('open','closed') NOT NULL DEFAULT 'open',
  created_at  DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  INDEX idx_client_channel (client_id, channel),
  FOREIGN KEY (client_id) REFERENCES clients(id)
);

Config-Driven vs Code-Driven Behavior

The trap in multi-tenant AI is writing per-client logic in code: if ($client === 'acme') { … }. That approach does not scale past three clients. Instead, all behavioral variation lives in the database.

CREATE TABLE client_configs (
  id          INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  client_id   INT UNSIGNED NOT NULL UNIQUE,
  persona_name        VARCHAR(100) NOT NULL DEFAULT 'Assistant',
  persona_tone        ENUM('professional','friendly','formal','casual') NOT NULL DEFAULT 'professional',
  persona_language    VARCHAR(10) NOT NULL DEFAULT 'en',
  llm_model           VARCHAR(80) NOT NULL DEFAULT 'llama3.1:8b',
  llm_temperature     DECIMAL(3,2) NOT NULL DEFAULT 0.70,
  max_context_turns   TINYINT UNSIGNED NOT NULL DEFAULT 10,
  rag_top_k           TINYINT UNSIGNED NOT NULL DEFAULT 5,
  human_handoff_phrase VARCHAR(200) DEFAULT NULL,
  FOREIGN KEY (client_id) REFERENCES clients(id)
);

When a new conversation arrives, the platform reads one row from client_configs and everything — persona, model, temperature, context window size — is determined without touching code.

The System Prompt as a Runtime Artifact

The system prompt is not a file in the repository. It is assembled at runtime from database-stored fragments. This is what makes per-client personality, product knowledge, and capability activation possible without code changes.

def build_system_prompt(client_id: int, db) -> str:
    cfg = db.query("SELECT * FROM client_configs WHERE client_id = ?", client_id)
    modes = db.query(
        "SELECT module_key, prompt_fragment FROM agent_modes "
        "WHERE client_id = ? AND is_active = 1 ORDER BY priority", client_id
    )
    knowledge = fetch_rag_context(client_id, current_query)  # top-k from ChromaDB

    parts = [
        f"You are {cfg['persona_name']}, an AI assistant for {cfg['client_name']}.",
        f"Tone: {cfg['persona_tone']}. Language: {cfg['persona_language']}.",
        "=== PRODUCT KNOWLEDGE ===",
        knowledge,
        "=== ACTIVE CAPABILITIES ===",
    ]
    for mode in modes:
        parts.append(mode['prompt_fragment'])

    return "\n\n".join(parts)

What to Watch For

Cross-tenant data leakage — Never fetch conversations or knowledge without a client_id filter. Add a DB layer check that rejects queries without it.
Prompt token overflow — System prompts grow as modes are added. Set a hard token budget and measure the assembled prompt before sending it to the LLM.
Config cache staleness — Cache client configs aggressively, but clear the cache immediately on any config update. Stale configs mean a client's bot behaves differently than what the admin just configured.

The Fundamental Unit: client_id

Config-Driven vs Code-Driven Behavior

The System Prompt as a Runtime Artifact

What to Watch For

Stay at the cutting edge