Today, a short tale of fun mystery-solving. I've been working on a project that involves a server dynamically generating audio files and streaming them to a client via a WebRTC session.
The dynamic audio generation process works like this: first run a program that generates a wav file. Then compress the .wav to an .ogg which contains an Opus audio stream and deliver it over the network. The WebRTC portion is handled by the awesome Pion library, a pure Golang implementation which makes customizing WebRTC (for example by streaming a dynamically generated audio file) super easy.
So here was the bug: certain audio files were delivered successfully over the wire and played seamlessly in the browser. Other files weren't. I knew there wasn't anything fundamentally wrong with the files that wouldn't transmit because I could listen to them with any old audio player (Google Chrome, for example). All of the files were ogg + Opus sampled at 48 kHz.
Since I could open the files locally, I figured there was probably an issue with the network. Chrome has a handy tool at chrome://webrtc-internals for inspecting WebRTC sessions. Sure enough, this tool revealed that the client never actually received any bytes of the problematic ogg files. But why?
Ogg is a container format which can hold multiple logical data streams, each with their own respective encoding. In this case, the ogg files held a single logical Opus stream. Mozilla maintains a useful tool called opusinfo in the opus-tools package that inspects Opus streams. Here's what the opusinfo output looks like for one of the files which transmitted successfully:
And here's the output for one which didn't transmit:
Pretty similar! But one obvious difference. The bad files had a much longer Page duration than the good ones.
Could this make a difference? It could! When the ogg files are streamed over the WebRTC media channel, they are sent via RTP, a protocol over UDP. Each RTP datagram contains one page of the ogg file. This meant that the WebRTC server was attempting to send 10kB datagrams. Too big! (The RTP MTU is 1200 bytes.)
Why did the bad files have such large pages? They were generated by compressing a .wav with ffmpeg, invoked like so:
But the ffmpeg ogg mux has a -page_duration setting to specify how to slice up the pages. I hadn't known about this setting and wasn't using it. The default: 1000ms. And so, the 19-character fix for my bug:
And all my files streamed happily ever after.