Series 6 — Part 6 of 10

Sending audio on WhatsApp requires five steps: TTS generation, WAV-to-OGG/OPUS conversion, upload to Meta, send the media_id, and monitor the delivery status callback. Each step can fail silently. This article walks the complete pipeline and identifies every failure point.

The Complete Audio Pipeline

function send_voice_response(string $recipientNumber, string $text): bool
{
    $tmpWav = null;
    $tmpOgg = null;

    try {
        // Step 1: Generate WAV with Kokoro TTS
        $wavBytes = generate_tts($text);
        $tmpWav   = save_tmp_bytes($wavBytes, '.wav');

        // Step 2: Convert WAV → OGG/OPUS via local converter service
        $tmpOgg = convert_wav_to_ogg($tmpWav);

        // Step 3: Upload to Meta and get media_id
        $mediaId = upload_to_meta($tmpOgg, 'audio/ogg; codecs=opus');

        // Step 4: Send via WhatsApp
        $result = send_whatsapp_audio($recipientNumber, $mediaId);

        // Step 5: The silent failure trap — API returns 200, but delivery may still fail
        // Delivery status comes via a separate webhook status callback (not here)
        log_msg('Audio sent. Delivery status pending webhook callback.', [
            'media_id'  => $mediaId,
            'recipient' => $recipientNumber,
        ]);

        return true;

    } catch (\Exception $e) {
        // Text fallback — never fail silently
        send_whatsapp_text($recipientNumber, $text);
        log_error('Audio pipeline failed, text fallback sent', ['error' => $e->getMessage()]);
        return false;

    } finally {
        // Always clean up temp files
        foreach ([$tmpWav, $tmpOgg] as $f) {
            if ($f && file_exists($f)) unlink($f);
        }
    }
}

WAV → OGG/OPUS Conversion

function convert_wav_to_ogg(string $wavPath): string
{
    $oggPath = sys_get_temp_dir() . '/' . uniqid('wa_ogg_') . '.ogg';

    // Use the local Python converter microservice (port 8882)
    $ch = curl_init('http://localhost:9012/convert');
    curl_setopt_array($ch, [
        CURLOPT_POST           => true,
        CURLOPT_POSTFIELDS     => ['file' => new \CURLFile($wavPath, 'audio/wav', 'audio.wav')],
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_TIMEOUT        => 20,
    ]);
    $oggBytes = curl_exec($ch);
    $code     = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if ($code !== 200 || !$oggBytes) {
        throw new \RuntimeException("OGG conversion failed: HTTP {$code}");
    }

    file_put_contents($oggPath, $oggBytes);
    return $oggPath;
}

The Silent Failure Trap

Meta's send API returns 200 with a message ID even when delivery will ultimately fail. The actual delivery status (sent / delivered / read / failed) arrives via a separate status webhook callback with the message ID. Without processing those status callbacks, you have no way to know if the audio reached the user.

Handle status callbacks:

// In your webhook handler, check for statuses alongside messages
$statuses = $payload['entry'][0]['changes'][0]['value']['statuses'] ?? [];
foreach ($statuses as $status) {
    $msgId  = $status['id'];
    $state  = $status['status'];  // 'sent', 'delivered', 'read', 'failed'
    $errors = $status['errors']   ?? [];
    log_delivery_status($msgId, $state, $errors, $pdo);
    if ($state === 'failed') {
        handle_delivery_failure($msgId, $errors, $pdo);
    }
}

What to Watch For

  • MIME type for upload — Meta requires audio/ogg; codecs=opus exactly. audio/ogg alone may be rejected. audio/mpeg for OGG files will be rejected.
  • Text fallback is not optional — Every audio send path must have a text fallback. A voice response that silently fails leaves the user with no reply at all.
  • Temp file cleanup in finally — Even if the OGG conversion throws, the WAV temp file must be cleaned up. Use finally, not cleanup only on success.