Series 9 — Part 2 of 5

FFmpeg is the Swiss army knife of audio and video processing. For voice AI pipelines, you need one core conversion (WAV → OGG/OPUS) and a handful of diagnostic techniques. This article covers the full command, why each flag matters, bitrate choices for voice, and codec availability checks.

The WAV → OGG/OPUS Command — Annotated

ffmpeg \
  -i input.wav \        # Input file
  -c:a libopus \        # Audio codec: Opus (requires libopus compiled in)
  -b:a 48k \            # Target bitrate: 48 kbps — voice sweet spot
  -vbr on \             # Variable bitrate: adapts to signal complexity
  -compression_level 10 \ # Encoding effort: 10 = maximum (slower but smaller)
  -frame_duration 20 \  # Frame size in ms: 20ms balances latency and quality
  -ar 48000 \           # Sample rate: 48000 Hz (Opus native rate)
  -ac 1 \               # Channels: 1 (mono — voice needs no stereo)
  -application voip \   # Codec optimisation: voip = optimised for speech
  -f ogg \              # Output container: Ogg (required for Opus)
  output.ogg

Bitrate Choices for Voice

Use caseCodecBitrateFile (1 min)Quality
WhatsApp voice noteOpus/OGG48k VBR~360 KBExcellent for voice
Podcast quality audioMP3128k CBR~960 KBHigh, but overkill for voice
Voicemail qualityOpus/OGG16k VBR~120 KBAcceptable, intelligible
Lowest viable voiceOpus/OGG8k VBR~60 KBRecognisable but degraded

Detecting Actual Format — Don't Trust the Extension

# The 'file' command reads magic bytes, not the extension
file input.wav
# → RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 22050 Hz
# → OR: Ogg data, Opus audio (Kokoro sometimes changes format without warning)

# In a script
format=$(file --mime-type -b "$input")
if [[ "$format" != "audio/x-wav" && "$format" != "audio/wav" ]]; then
    echo "Unexpected format: $format — aborting"
    exit 1
fi

Codec Availability Verification

# Check libopus is available
ffmpeg -codecs 2>/dev/null | grep -i opus
# Should show: DEA.L. opus   Opus (Opus Interactive Audio Codec) (decoders: opus libopus)

# Check libmp3lame (for MP3 output)
ffmpeg -codecs 2>/dev/null | grep -i mp3lame

# Test the conversion pipeline end-to-end before deploying
ffmpeg -i /tmp/test.wav -c:a libopus -b:a 48k -f ogg /tmp/test.ogg && echo "Conversion OK"
file /tmp/test.ogg  # Confirm format

What to Watch For

  • Input sample rate mismatch — Kokoro may output WAV at 22050 Hz. The -ar 48000 flag resamples to 48kHz for Opus. Without it, you get a quality warning in FFmpeg output and may get playback issues on some devices.
  • The -y flag-y overwrites the output file without asking. Always use it in non-interactive scripts, or FFmpeg will hang waiting for confirmation that never comes.
  • FFmpeg stderr goes to /dev/null — FFmpeg is verbose. In production, redirect stderr: ffmpeg ... 2>/dev/null. But capture it in debug mode or you'll miss important error messages.