Series 4 — Part 5 of 8

WhatsApp gives you 20 seconds to respond to a webhook. Ollama takes 15-30 seconds to generate a response. The math doesn't work synchronously. Celery + Redis is the solution: acknowledge the webhook immediately, generate the AI response asynchronously, then send it.

The Webhook Timeout Problem

Meta's WhatsApp webhook documentation is clear: if your endpoint does not return a 200 within 20 seconds, the webhook will be retried. Retries without deduplication mean double-processing. The solution is a two-phase architecture:

  1. Phase 1 (synchronous, <1s) — Verify HMAC signature, deduplicate by wa_msg_id, enqueue a Celery task, return 200.
  2. Phase 2 (async, <60s) — The Celery worker processes the message: builds system prompt, calls Ollama, sends the response to WhatsApp.

Celery Setup with Redis

# celery_app.py
from celery import Celery

celery = Celery(
    "the chatbot platform",
    broker="redis://localhost:6379/0",
    backend="redis://localhost:6379/1",
)
celery.conf.update(
    task_serializer="json",
    result_serializer="json",
    accept_content=["json"],
    task_acks_late=True,          # Don't ack until the task completes successfully
    task_reject_on_worker_lost=True,
    worker_prefetch_multiplier=1, # Process one task at a time per worker
)

# tasks.py
from celery_app import celery

@celery.task(bind=True, max_retries=3, default_retry_delay=10)
def process_whatsapp_message(self, client_id: int, wa_contact_id: str, message_text: str, wa_msg_id: str):
    try:
        system_prompt = build_system_prompt(client_id, message_text, db, chroma)
        history       = get_conversation_history(client_id, wa_contact_id, db)
        response      = generate_response(system_prompt, history + [{"role": "user", "content": message_text}])
        send_whatsapp_text(wa_contact_id, response)
        store_message(client_id, wa_contact_id, message_text, response, db)
    except Exception as exc:
        raise self.retry(exc=exc)

Retry Logic and Dead Letter Queues

celery.conf.update(
    task_routes={
        "tasks.process_whatsapp_message": {"queue": "whatsapp"},
        "tasks.process_widget_message":   {"queue": "widget"},
    },
    # Dead letter queue: tasks that exhaust retries go here for inspection
    task_queues={
        "whatsapp": {"exchange": "whatsapp", "routing_key": "whatsapp"},
        "dead":     {"exchange": "dead",     "routing_key": "dead"},
    },
)

Monitor the dead letter queue. A task that lands there means Ollama crashed, Redis was unavailable, or WhatsApp returned a permanent error. Each case needs a different fix — don't just re-queue blindly.

What to Watch For

  • Duplicate task enqueue — If the webhook returns 200 but Celery enqueue fails silently, the message is lost. Use a try/except around task.delay() and return a 500 if enqueue fails — forcing a retry from Meta.
  • Redis OOM — Celery results accumulate in Redis unless you set result_expires. Set it to 1 hour for debugging, 5 minutes in production.
  • Worker count vs model concurrency — Ollama processes one request at a time by default. More Celery workers than Ollama can serve concurrently will cause request queuing inside Ollama — not in your system where you can observe it.