Series 6 — Part 8 of 10

the WhatsApp AI agent serves lawyers and clients who communicate in Hindi, Punjabi, and English — sometimes switching within a single message. This article covers language detection from incoming text, storing preferred_lang per contact, language-specific acknowledgements, and mid-conversation language switching.

Language Detection

function detect_language(string $text): string
{
    // Script-based detection is faster and more reliable than statistical models for short messages
    // Devanagari Unicode block: U+0900–U+097F
    if (preg_match('/[\x{0900}-\x{097F}]/u', $text)) return 'hi';
    // Gurmukhi Unicode block: U+0A00–U+0A7F (Punjabi)
    if (preg_match('/[\x{0A00}-\x{0A7F}]/u', $text)) return 'pa';
    // Default to English
    return 'en';
}

function update_preferred_language(int $contactId, string $lang, PDO $pdo): void
{
    // Only update if language has been consistent for 3+ messages (avoid single-message noise)
    $pdo->prepare(
        'UPDATE wa_contacts SET preferred_lang = ?, lang_updated_at = NOW() WHERE id = ?'
    )->execute([$lang, $contactId]);
}

Language-Specific System Prompt Injection

function get_language_instruction(string $lang): string
{
    return match($lang) {
        'hi' => "Respond in Hindi (Devanagari script). Use formal Hindi for legal contexts.",
        'pa' => "Respond in Punjabi (Gurmukhi script). Keep legal terms in English where no standard Punjabi equivalent exists.",
        'en' => "Respond in English. Use plain English — avoid legal jargon unless the user is a lawyer.",
        default => "Respond in the same language the user used.",
    };
}

Language-Specific Error Messages

const ERROR_MESSAGES = [
    'transcription_failed' => [
        'en' => "I received your voice note but couldn't transcribe it. Could you type your message?",
        'hi' => "मुझे आपका वॉइस नोट मिला, लेकिन मैं इसे समझ नहीं पाया। क्या आप अपना संदेश टाइप कर सकते हैं?",
        'pa' => "ਮੈਨੂੰ ਤੁਹਾਡਾ ਵੌਇਸ ਨੋਟ ਮਿਲਿਆ ਪਰ ਮੈਂ ਇਸਨੂੰ ਸਮਝ ਨਹੀਂ ਸਕਿਆ। ਕੀ ਤੁਸੀਂ ਆਪਣਾ ਸੁਨੇਹਾ ਟਾਈਪ ਕਰ ਸਕਦੇ ਹੋ?",
    ],
    'unknown_contact' => [
        'en' => "This is a private system. Please contact the office on the number provided.",
        'hi' => "यह एक निजी प्रणाली है। कृपया दिए गए नंबर पर कार्यालय से संपर्क करें।",
        'pa' => "ਇਹ ਇੱਕ ਨਿੱਜੀ ਸਿਸਟਮ ਹੈ। ਕਿਰਪਾ ਕਰਕੇ ਦਿੱਤੇ ਨੰਬਰ 'ਤੇ ਦਫ਼ਤਰ ਨਾਲ ਸੰਪਰਕ ਕਰੋ।",
    ],
];

function get_error_message(string $key, string $lang): string
{
    return ERROR_MESSAGES[$key][$lang] ?? ERROR_MESSAGES[$key]['en'];
}

What to Watch For

  • Code-mixed text — Many users write "mera case ka status kya hai?" (Hindi words in Roman script). Script-based detection fails here. For Roman-script mixed messages, use the LLM to detect and respond in kind.
  • Transliteration vs script — Storing preferred_lang as a language code ('hi') is correct. Do not store script preference separately unless you need to differentiate Devanagari from Roman-script Hindi.
  • Legal terms in translation — Legal terminology often has no adequate translation. "Vakalatnama", "FIR", "Section 138" — keep these in their original form even in Hindi/Punjabi responses.