Series 7 — Part 4 of 6
Kokoro TTS always outputs WAV. WhatsApp requires OGG/OPUS. FFmpeg bridges the gap. This article covers the exact FFmpeg command for voice audio, bitrate choices, codec availability checks, and temp file lifecycle management.
The Conversion Command
# Convert WAV to OGG/OPUS optimized for voice
ffmpeg \
-i input.wav \
-c:a libopus \
-b:a 48k \
-vbr on \
-compression_level 10 \
-frame_duration 20 \
-ar 48000 \
-ac 1 \
-f ogg \
output.ogg
# Verify the output format
file output.ogg
# → Ogg data, Opus audio, mono, 48000 Hz
Flag rationale:
-c:a libopus— Opus codec, required for WhatsApp voice compatibility-b:a 48k— 48 kbps is the sweet spot for voice quality vs file size-vbr on— Variable bitrate: quieter passages use less, speech uses more-ar 48000 -ac 1— 48kHz mono: standard for voice; stereo wastes bandwidth
Detecting Format with file vs Trusting the Extension
Kokoro outputs WAV regardless of the format you request. Never trust the extension. The file command checks the magic bytes:
file output.wav
# Could return: "RIFF (little-endian) data, WAVE audio" ← correct
# Or could return: "Ogg data, Opus audio" ← Kokoro version changed format
# Always check before passing to FFmpeg
Codec Availability Check
# Verify libopus is compiled into this FFmpeg build
ffmpeg -codecs 2>/dev/null | grep opus
# Should show: DEA... opus Opus (Opus Interactive Audio Codec)
# If not available, install:
sudo apt install ffmpeg # Usually includes libopus on Raspberry Pi OS
# Or compile FFmpeg with --enable-libopus
Temp File Lifecycle in PHP
function convert_wav_to_ogg_local(string $wavPath): string
{
$oggPath = sys_get_temp_dir() . '/' . bin2hex(random_bytes(8)) . '.ogg';
$cmd = sprintf(
'ffmpeg -i %s -c:a libopus -b:a 48k -vbr on -ar 48000 -ac 1 -f ogg %s 2>/dev/null',
escapeshellarg($wavPath),
escapeshellarg($oggPath)
);
exec($cmd, $output, $exitCode);
if ($exitCode !== 0 || !file_exists($oggPath)) {
throw new \RuntimeException("FFmpeg conversion failed (exit: {$exitCode})");
}
return $oggPath;
}
// Caller is responsible for cleanup — use try/finally
$oggPath = null;
try {
$oggPath = convert_wav_to_ogg_local($wavPath);
$mediaId = upload_to_meta($oggPath, 'audio/ogg; codecs=opus');
} finally {
if ($oggPath && file_exists($oggPath)) unlink($oggPath);
if (file_exists($wavPath)) unlink($wavPath);
}
What to Watch For
- exec() vs shell_exec() — Use
exec()with$exitCodecheck.shell_exec()returns output as a string but gives no exit code — you can't tell if FFmpeg succeeded. - The /tmp inode exhaustion — If temp files are never cleaned up, /tmp fills with orphaned files. The
finallyblock is the correct pattern, not cleanup on success only. - Predictable temp file names — Never use
time()oruniqid()alone for temp file names. Userandom_bytes(8)for unpredictability — predictable names are path traversal opportunities.