Articles — Govind Preet Singh

Speech-to-Text Pipeline with Whisper

Downloading voice notes from Meta's media endpoint, local Whisper transcription via HTTP microservice, language hint injection, and graceful…

The WAV-not-MP3 trap, the UTF-8 /u flag corruption bug in prepareText(), audio type classification, and keeping the model warm with a health…

TTS → WAV → OGG/OPUS via FFmpeg → Meta upload → send media_id → monitor delivery status. The silent failure trap: API returns 200 but delive…

Storing wa_message_id + transcript on created workspace items, WebhookContext globals for cross-cutting request state, and the media URL exp…

Kokoro always outputs WAV regardless of requested format. FFmpeg converts WAV → OGG/OPUS at 48kHz mono 48kbps. The exact command, bitrate ch…

The two-step upload → send flow, MIME type requirements (audio/ogg; codecs=opus), delivery status callbacks, and the silent success trap: AP…

WAV → OGG/OPUS: the full annotated command. Bitrate choices for voice (48k VBR Opus). Detecting actual format with the file command vs trust…

Test each step independently: TTS, conversion, upload, send. Meta delivery status as ground truth. Decoding common error codes (131053 and o…

Stripping WhatsApp markdown (/u flag required), expanding legal abbreviations for natural pronunciation, converting bullet lists to spoken c…