The past 24 hours have been intense, and good things are happening. Last night, I literally sketched out a variety of diagrams depicting how a distributed recode queueing system might work. I’ve settled on a general design, and did some preliminary process stream and network socket code testing in php to confirm it is possible. This morning, I felt like just jumping into a core component of my project that I had yet to touch: the recode managing code that itself, that each machine in the recoding cluster will run. This code’s job will be to supervise the forked processes it creates to decode, encode, and upload media files, and periodically send job status updates back to the machine that gave it its current job. The tools I will be using to get the job done will be mplayer, whose job will be decoding to PCM audio and YUV video, and the reference implementation of an encoder supplied as part of libtheora. I’m referring to examples/encoder_example.c. It’s code is not especially resillient if it were faced with corrupt or misformed files, but there doesn’t really seem to be much else in the way of terminal-based fronts to libtheora out there, and it is suitable since I can be certain its input (generated by MPlayer) will be properly constructed.

Or not.

Turns out, MPlayer’s PCM output code writes misformed RIFF headers on the front of the stream, and then goes back and fixes it at the end when it knows the file’s total size. (No Google searches to help me on that one, just a full days worth of examining wav files in a hex editor and sleuthing in MPlayer source code 🙂 ) This works great if you’re going to write to a normal file, but is a problem if you’re writing to an unseekable stream. For my project, having this capability is essential: I DO want the recode boxes to be able to start downloading the file to be recoded, and be writing data right back to the file repository as fast as it can be downloaded or recoded, rather than downloading the entire thing before decoding can commence, and recoding the entire thing before uploading can commence. I don’t want to be writing & keeping track of extremely large temporary decompressed audio and video files on the local filesystem, either.

So this means the decompressed audio and video must be piped directly to the encoder – and these are unseekable streams. The result is sometimes that encoder_example detects MPlayer’s bogus RIFF header and produces an error, otherwise it passes the audio on to libvoribis as audio with a sample rate in the Mhz (not khz) — and encoding dies there instead. (Hacking example_encoder to properly report sample rate doesn’t fix it, libvorbis still gets confused by the misformed RIFF header too.) This seems to be a long-standing problem that nobody has quite figured out before, so here’s hoping this entry finds its way to relevant Google searches.

As of right now, I unfortunately haven’t come up with any great ideas for how to calculate a correct RIFF header at the beginning, before the decompression itself has happened. The key missing piece of information within ao_pcm.c is the total (chronological) length of the file. I tried adding a (floating-point) “length” option to the -ao pcm set of suboptions so that length in seconds could be specified directly to this audio output plugin, but found it to be unrealistic to acquire a sufficiently accurate time in seconds and microseconds for the program to produce the correct header. So, nothing worthy of an MPlayer patch just yet. HOWEVER, as it turns out, encoder_example and libvorbis have excellent tolerance for somewhat miscalculated file and data length fields in the RIFF header. In fact, I’ve experimented with values that underestimated by 50%, and still the encoder has properly included all the audio in perfect sync in the outputted ogg.

So I’m probably going to chalk this up as being solved. And as it was the only missing link in a complete passthru from the http media download stream all the way to the ogg vorbis upload stream at the other end, I’m pretty happy to have gotten it figured out.