Celery + Redis Task Queue for AI

Series 4 — Part 5 of 8

WhatsApp gives you 20 seconds to respond to a webhook. Ollama takes 15-30 seconds to generate a response. The math doesn't work synchronously. Celery + Redis is the solution: acknowledge the webhook immediately, generate the AI response asynchronously, then send it.

The Webhook Timeout Problem

Meta's WhatsApp webhook documentation is clear: if your endpoint does not return a 200 within 20 seconds, the webhook will be retried. Retries without deduplication mean double-processing. The solution is a two-phase architecture:

Phase 1 (synchronous, <1s) — Verify HMAC signature, deduplicate by wa_msg_id, enqueue a Celery task, return 200.
Phase 2 (async, <60s) — The Celery worker processes the message: builds system prompt, calls Ollama, sends the response to WhatsApp.

Celery Setup with Redis

# celery_app.py
from celery import Celery

celery = Celery(
    "the chatbot platform",
    broker="redis://localhost:6379/0",
    backend="redis://localhost:6379/1",
)
celery.conf.update(
    task_serializer="json",
    result_serializer="json",
    accept_content=["json"],
    task_acks_late=True,          # Don't ack until the task completes successfully
    task_reject_on_worker_lost=True,
    worker_prefetch_multiplier=1, # Process one task at a time per worker
)

# tasks.py
from celery_app import celery

@celery.task(bind=True, max_retries=3, default_retry_delay=10)
def process_whatsapp_message(self, client_id: int, wa_contact_id: str, message_text: str, wa_msg_id: str):
    try:
        system_prompt = build_system_prompt(client_id, message_text, db, chroma)
        history       = get_conversation_history(client_id, wa_contact_id, db)
        response      = generate_response(system_prompt, history + [{"role": "user", "content": message_text}])
        send_whatsapp_text(wa_contact_id, response)
        store_message(client_id, wa_contact_id, message_text, response, db)
    except Exception as exc:
        raise self.retry(exc=exc)

Retry Logic and Dead Letter Queues

celery.conf.update(
    task_routes={
        "tasks.process_whatsapp_message": {"queue": "whatsapp"},
        "tasks.process_widget_message":   {"queue": "widget"},
    },
    # Dead letter queue: tasks that exhaust retries go here for inspection
    task_queues={
        "whatsapp": {"exchange": "whatsapp", "routing_key": "whatsapp"},
        "dead":     {"exchange": "dead",     "routing_key": "dead"},
    },
)

Monitor the dead letter queue. A task that lands there means Ollama crashed, Redis was unavailable, or WhatsApp returned a permanent error. Each case needs a different fix — don't just re-queue blindly.

What to Watch For

Duplicate task enqueue — If the webhook returns 200 but Celery enqueue fails silently, the message is lost. Use a try/except around task.delay() and return a 500 if enqueue fails — forcing a retry from Meta.
Redis OOM — Celery results accumulate in Redis unless you set result_expires. Set it to 1 hour for debugging, 5 minutes in production.
Worker count vs model concurrency — Ollama processes one request at a time by default. More Celery workers than Ollama can serve concurrently will cause request queuing inside Ollama — not in your system where you can observe it.

The Webhook Timeout Problem

Celery Setup with Redis

Retry Logic and Dead Letter Queues

What to Watch For

Stay at the cutting edge