Series 7 — Part 4 of 6

Kokoro TTS always outputs WAV. WhatsApp requires OGG/OPUS. FFmpeg bridges the gap. This article covers the exact FFmpeg command for voice audio, bitrate choices, codec availability checks, and temp file lifecycle management.

The Conversion Command

# Convert WAV to OGG/OPUS optimized for voice
ffmpeg \
  -i input.wav \
  -c:a libopus \
  -b:a 48k \
  -vbr on \
  -compression_level 10 \
  -frame_duration 20 \
  -ar 48000 \
  -ac 1 \
  -f ogg \
  output.ogg

# Verify the output format
file output.ogg
# → Ogg data, Opus audio, mono, 48000 Hz

Flag rationale:

  • -c:a libopus — Opus codec, required for WhatsApp voice compatibility
  • -b:a 48k — 48 kbps is the sweet spot for voice quality vs file size
  • -vbr on — Variable bitrate: quieter passages use less, speech uses more
  • -ar 48000 -ac 1 — 48kHz mono: standard for voice; stereo wastes bandwidth

Detecting Format with file vs Trusting the Extension

Kokoro outputs WAV regardless of the format you request. Never trust the extension. The file command checks the magic bytes:

file output.wav
# Could return: "RIFF (little-endian) data, WAVE audio" ← correct
# Or could return: "Ogg data, Opus audio" ← Kokoro version changed format
# Always check before passing to FFmpeg

Codec Availability Check

# Verify libopus is compiled into this FFmpeg build
ffmpeg -codecs 2>/dev/null | grep opus
# Should show: DEA... opus  Opus (Opus Interactive Audio Codec)

# If not available, install:
sudo apt install ffmpeg  # Usually includes libopus on Raspberry Pi OS
# Or compile FFmpeg with --enable-libopus

Temp File Lifecycle in PHP

function convert_wav_to_ogg_local(string $wavPath): string
{
    $oggPath = sys_get_temp_dir() . '/' . bin2hex(random_bytes(8)) . '.ogg';

    $cmd = sprintf(
        'ffmpeg -i %s -c:a libopus -b:a 48k -vbr on -ar 48000 -ac 1 -f ogg %s 2>/dev/null',
        escapeshellarg($wavPath),
        escapeshellarg($oggPath)
    );

    exec($cmd, $output, $exitCode);

    if ($exitCode !== 0 || !file_exists($oggPath)) {
        throw new \RuntimeException("FFmpeg conversion failed (exit: {$exitCode})");
    }

    return $oggPath;
}

// Caller is responsible for cleanup — use try/finally
$oggPath = null;
try {
    $oggPath = convert_wav_to_ogg_local($wavPath);
    $mediaId = upload_to_meta($oggPath, 'audio/ogg; codecs=opus');
} finally {
    if ($oggPath && file_exists($oggPath)) unlink($oggPath);
    if (file_exists($wavPath))             unlink($wavPath);
}

What to Watch For

  • exec() vs shell_exec() — Use exec() with $exitCode check. shell_exec() returns output as a string but gives no exit code — you can't tell if FFmpeg succeeded.
  • The /tmp inode exhaustion — If temp files are never cleaned up, /tmp fills with orphaned files. The finally block is the correct pattern, not cleanup on success only.
  • Predictable temp file names — Never use time() or uniqid() alone for temp file names. Use random_bytes(8) for unpredictability — predictable names are path traversal opportunities.