Series 6 — Part 8 of 10
the WhatsApp AI agent serves lawyers and clients who communicate in Hindi, Punjabi, and English — sometimes switching within a single message. This article covers language detection from incoming text, storing preferred_lang per contact, language-specific acknowledgements, and mid-conversation language switching.
Language Detection
function detect_language(string $text): string
{
// Script-based detection is faster and more reliable than statistical models for short messages
// Devanagari Unicode block: U+0900–U+097F
if (preg_match('/[\x{0900}-\x{097F}]/u', $text)) return 'hi';
// Gurmukhi Unicode block: U+0A00–U+0A7F (Punjabi)
if (preg_match('/[\x{0A00}-\x{0A7F}]/u', $text)) return 'pa';
// Default to English
return 'en';
}
function update_preferred_language(int $contactId, string $lang, PDO $pdo): void
{
// Only update if language has been consistent for 3+ messages (avoid single-message noise)
$pdo->prepare(
'UPDATE wa_contacts SET preferred_lang = ?, lang_updated_at = NOW() WHERE id = ?'
)->execute([$lang, $contactId]);
}
Language-Specific System Prompt Injection
function get_language_instruction(string $lang): string
{
return match($lang) {
'hi' => "Respond in Hindi (Devanagari script). Use formal Hindi for legal contexts.",
'pa' => "Respond in Punjabi (Gurmukhi script). Keep legal terms in English where no standard Punjabi equivalent exists.",
'en' => "Respond in English. Use plain English — avoid legal jargon unless the user is a lawyer.",
default => "Respond in the same language the user used.",
};
}
Language-Specific Error Messages
const ERROR_MESSAGES = [
'transcription_failed' => [
'en' => "I received your voice note but couldn't transcribe it. Could you type your message?",
'hi' => "मुझे आपका वॉइस नोट मिला, लेकिन मैं इसे समझ नहीं पाया। क्या आप अपना संदेश टाइप कर सकते हैं?",
'pa' => "ਮੈਨੂੰ ਤੁਹਾਡਾ ਵੌਇਸ ਨੋਟ ਮਿਲਿਆ ਪਰ ਮੈਂ ਇਸਨੂੰ ਸਮਝ ਨਹੀਂ ਸਕਿਆ। ਕੀ ਤੁਸੀਂ ਆਪਣਾ ਸੁਨੇਹਾ ਟਾਈਪ ਕਰ ਸਕਦੇ ਹੋ?",
],
'unknown_contact' => [
'en' => "This is a private system. Please contact the office on the number provided.",
'hi' => "यह एक निजी प्रणाली है। कृपया दिए गए नंबर पर कार्यालय से संपर्क करें।",
'pa' => "ਇਹ ਇੱਕ ਨਿੱਜੀ ਸਿਸਟਮ ਹੈ। ਕਿਰਪਾ ਕਰਕੇ ਦਿੱਤੇ ਨੰਬਰ 'ਤੇ ਦਫ਼ਤਰ ਨਾਲ ਸੰਪਰਕ ਕਰੋ।",
],
];
function get_error_message(string $key, string $lang): string
{
return ERROR_MESSAGES[$key][$lang] ?? ERROR_MESSAGES[$key]['en'];
}
What to Watch For
- Code-mixed text — Many users write "mera case ka status kya hai?" (Hindi words in Roman script). Script-based detection fails here. For Roman-script mixed messages, use the LLM to detect and respond in kind.
- Transliteration vs script — Storing preferred_lang as a language code ('hi') is correct. Do not store script preference separately unless you need to differentiate Devanagari from Roman-script Hindi.
- Legal terms in translation — Legal terminology often has no adequate translation. "Vakalatnama", "FIR", "Section 138" — keep these in their original form even in Hindi/Punjabi responses.