So I was a little too hasty on my previous 2 AM post of several days ago…I assumed the RIFF header problem was the only problem. Alas, once the encoder accepts the input data, it still reads the decompressed audio and video streams not at all in parallel. In other words, it doesn’t just grab a few kb of video data, process it, and then grab a few kb of corresponding audio data. Instead, it reads a ton of video data or a ton of audio data before switching streams. This is the reason many other sites have pointed out that you can’t directly feed both the decompressed audio and video from mplayer to the encoder via two named pipes: the current version of Linux has these fifo buffers pegged at 64 kb, and it was even smaller in older kernels. So, it seemed time to build in some additional buffering.

I looked through encoder_example.c first — in theory, you could implement it there, by having it read and temporarily store data from one named pipe whenever the other was empty, until data started appearing on it again. Trouble is, my experience with C is limited to one course, and it was in C++, so I’d have to do a bit of self-teaching to implement the kind of data structure and memory allocation that is necessary.

I spend a while attempting that, and then decided I was in a bit over my head. I ran back home to PHP and created an external buffering script with it, mostly just to make absolutely sure that it would work. It reads in data from a fifo mplayer is writing to, buffers it up to 10 MB, and writes it out to another fifo that is being emptied by the encoder. I didn’t really expect it to operate with any semblance of efficiency, which it definitely does not, but it did succeed in proving to me that figuring out how to implement a buffer directly within the encoder would really make it all work and thus be worth the effort.

And that’s where I’m at now…I didn’t expect this project would involve any coding outside of php…or contributions to other projects…but hey, this is where my SoC journey heading. Plus, it will be a fun challenge and a good achievement for me to improve the encoder.

I think I should also quickly comment on why I am so studiously ignoring ffmpeg2theora. Basically, because it was built to do something else. Since I’m clearly going to have to make modifications to example_encoder too, that reason may not be so valid, but sticking with mplayer provides a few extra benefits: obviously, people will be able to upload content in a few additional formats thanks to mplayer’s codec packs, and also this provides a tidy way to retain a copy of decompressed output that the encoder can reuse for, say, producing a ogg vorbis files at both low and high bitrates, without having to decompress twice.

I’ll still need to work on how to get a single instance of mplayer to decode at faster than normal playback speed…I’m not sure how robust the -speed 100 option is, and it also presents more RIFF header problems.

The past 24 hours have been intense, and good things are happening. Last night, I literally sketched out a variety of diagrams depicting how a distributed recode queueing system might work. I’ve settled on a general design, and did some preliminary process stream and network socket code testing in php to confirm it is possible. This morning, I felt like just jumping into a core component of my project that I had yet to touch: the recode managing code that itself, that each machine in the recoding cluster will run. This code’s job will be to supervise the forked processes it creates to decode, encode, and upload media files, and periodically send job status updates back to the machine that gave it its current job. The tools I will be using to get the job done will be mplayer, whose job will be decoding to PCM audio and YUV video, and the reference implementation of an encoder supplied as part of libtheora. I’m referring to examples/encoder_example.c. It’s code is not especially resillient if it were faced with corrupt or misformed files, but there doesn’t really seem to be much else in the way of terminal-based fronts to libtheora out there, and it is suitable since I can be certain its input (generated by MPlayer) will be properly constructed.

Or not.

Turns out, MPlayer’s PCM output code writes misformed RIFF headers on the front of the stream, and then goes back and fixes it at the end when it knows the file’s total size. (No Google searches to help me on that one, just a full days worth of examining wav files in a hex editor and sleuthing in MPlayer source code 🙂 ) This works great if you’re going to write to a normal file, but is a problem if you’re writing to an unseekable stream. For my project, having this capability is essential: I DO want the recode boxes to be able to start downloading the file to be recoded, and be writing data right back to the file repository as fast as it can be downloaded or recoded, rather than downloading the entire thing before decoding can commence, and recoding the entire thing before uploading can commence. I don’t want to be writing & keeping track of extremely large temporary decompressed audio and video files on the local filesystem, either.

So this means the decompressed audio and video must be piped directly to the encoder – and these are unseekable streams. The result is sometimes that encoder_example detects MPlayer’s bogus RIFF header and produces an error, otherwise it passes the audio on to libvoribis as audio with a sample rate in the Mhz (not khz) — and encoding dies there instead. (Hacking example_encoder to properly report sample rate doesn’t fix it, libvorbis still gets confused by the misformed RIFF header too.) This seems to be a long-standing problem that nobody has quite figured out before, so here’s hoping this entry finds its way to relevant Google searches.

As of right now, I unfortunately haven’t come up with any great ideas for how to calculate a correct RIFF header at the beginning, before the decompression itself has happened. The key missing piece of information within ao_pcm.c is the total (chronological) length of the file. I tried adding a (floating-point) “length” option to the -ao pcm set of suboptions so that length in seconds could be specified directly to this audio output plugin, but found it to be unrealistic to acquire a sufficiently accurate time in seconds and microseconds for the program to produce the correct header. So, nothing worthy of an MPlayer patch just yet. HOWEVER, as it turns out, encoder_example and libvorbis have excellent tolerance for somewhat miscalculated file and data length fields in the RIFF header. In fact, I’ve experimented with values that underestimated by 50%, and still the encoder has properly included all the audio in perfect sync in the outputted ogg.

So I’m probably going to chalk this up as being solved. And as it was the only missing link in a complete passthru from the http media download stream all the way to the ogg vorbis upload stream at the other end, I’m pretty happy to have gotten it figured out.

So it’s been a bit since I’ve posted…and since then, a lot has happened. Perhaps the most significant thing is my getting schooled pretty good on #mediawiki about the lack of merits of a pluggable mime detection system. Basically, its not that a pluggable media validation system is a bad idea, it’s that there’s no need to make any use of the existing MimeMagic module at all. My approach had been to start with it, and then use plugins to fine-tune its results as necessary. This is nice in that it doesn’t require a complete refactoring of uploading in MediaWiki, but it is overly complicated and has a few big drawbacks, one being that such a design wouldn’t really provide for extracting and caching file metadata, so that would have to be done at a later stage, requiring an in-depth analysis of the files twice.

 A better design is to just have upload validation handlers register themselves for file extensions that they can analyze, and have no generic mime detection at all. This is a part of Daniel’s proposed design expressed in his blog. Another nice not-yet mentioned detail of this design is that it provides for a clean way to taylor the recommended maximum upload size to different media types. The obvious drawback to this is that it requires the development of plugins specializing in reading & extracting metadata from every file type you want to support before this design could be deployed.

 Perhaps that wouldn’t actually be as much work as it sounds like. In the past 2 days, I’ve essentially created such a plugin that covers the entire audio and video arena, and I think it does a really good job, too. Currently, it can use the ffmpeg-php api, MPlayer’s little-known companion script midentify, or both. Adding additional analysis mechanisms would be a very straightforward process, but between those two you obtain pretty good overarching support for validating audio and video types. Actually, since MPlayer gives you ffmpeg’s abilities and then some, you can get along fine on midentify only, but I wrote support for ffmpeg-php for two reasons. The main one is that in my tests it sometimes finishes quicker, and sometimes by a lot. (Total runtimes for validation using a composite ffmpeg/mplayer solution are running between < .1 and ~.8 seconds on my test machine, depending on whether both end up needing to be invoked, so its in the ballpark.) Additionally, if MediaWiki moves to validating uploads using a number of dedicated plugins like this requiring some external utility or other, I can just hear the rumblings from private MediaWiki users. MPlayer is at least a truckload of RPM downloads, and at most a troublesome build/install, so for some that aren’t looking to do media recoding too, the php extension might make their installation experience easier.

 And that brings you to where I am at this moment. Currently I’m invoking this code through the existing UploadVerification hook, which operates with no regard to the uploaded extension/type. It’d be nice if my code only got called on uploads of extensions it had registered with the upload validator as audio or video (a long list, I know…and probably some work to properly compile) but for now, I think I’ll just emulate this by providing a list of extensions that it can verify as an array or sommat, and immediately return execution to the calling script if the upload isn’t on the list. Hopefully, at some point in the future, that can easily be adapted to register my code as the handler for those audio and video types.

And, if things start being done that way, it might also provide a good replacement for $wgFileExtensions, which otherwise is at risk of becoming tediously long as more file types get properly supported.

You can experiment with my code at http://mikeb.servehttp.com:8080/wiki/phase3/ — though this is the machine I’m working on, so it will be occasionally broken, have wierd debugging output, etc.

MediaWiki’s Mime detection system, which is used as a core component of upload verification as well as to direct media files to the appropriate media handler (for proper transformations, extraction of metadata, etc) works okay for the limited number of formats that are currently officially supported. Basically we’re talking images, some vector graphics formats, djvu files, and ogg A/V. Unfortunately, reliably identifying this small subset of media types has already required many sprinklings of code specific to a single content-type to tweak incorrect magic mime guesses,  etc.

 Try feeding the existing MimeMagic module a .asf video file or a lossless/uncompressed YUV video stream. (Hint: it no workie.) Since many more media types will be added in the near future through my project and others, I decided something definitive needed to be done. Rather than continuing to add more and more special-case code to the one (already bloated) class every time a new problem comes up, I’m creating a framework for the main module to make use of plugins as necessary that specialize in particular content-types. Without getting too technical about a plugin’s abilities, they will function as self-contained classes capable of supplying the main module with information about the content type(s) they target in general as well as in the context of a specific given file. 

A number of todo’s in the main module highlight other shortcomings that I’m shooting to address, or at least design the plugin framework to make it easier to address all of them as well. 

This wasn’t something I foresaw myself working on per-se, but it is something that really should be done before I get into the heart of my proposed project. I’ll make my case for it on wikitech-l and post my code to my branch on SVN in the coming days. (That’s right, I’m too chicken to ask first, I’m just writing it and hoping it will guilt-trip it’s way into mainline code 🙂 Actually, I think conceptually my idea is perfectly valid, so the only thing to worry about is if people don’t like my implementation…but I’m putting much care into it too. Stay tuned to hear how it goes…

I’ve been unusually inspired to be productive over the past few days…but mostly that’s turned into a discovery of how hard it is to jump into things. There’s just so much content – not only in code but also in ongoing discussions and documentation – to keep track of that I don’t know where to start or if I’m missing key considerations because I haven’t reviewed absolutely everything out there. For example,  in hopes that video contributions become popular, I want to implement this so that it can scale to multiple systems that do recoding. But how to do that most easily, considering the underlying technology/protocols in use on wikimedia’s private networks? The only real trick is preventing two recoding systems from simultaneously grabbing the same job from the queue before it is updated. One solution would be to just let this happen, and detect it by having all recode systems dump their output directly to a central file store using a predictable naming scheme. Then open new files with O_EXCL|O_CREAT, and if it fails then someone else has already claimed that job, so go get another one. But this requires a shared filesystem that supports that…currently afaik wikimedia is using NFS, but does an open call with O_EXCL|O_CREAT  work right under NFS? Heck if I know. And there’s discussion about removing the use of NFS anyway and switching to an API for private server to server file transfer in development by Tim Starling. I’m afraid if that route is taken, I won’t even be able to encode directly to the central file store (instead make locally, then copy, then delete…which takes a bit longer and is more complicated.)

Then there’s this whole concept of “media handlers” (which Tim’s also at work on) – From the looks of older posts to wikitech-l, they’re supposed to be relevant. I haven’t found formal documentation though, or any mention of them in includes/SpecialUpload.php, where uploads are handled. Makes me think they’re for the display side of things only, but wikitech-l messages make it look otherwise. I could scour lots more code to figure out what’s going on, but I’m waiting for a chance to talk with Tim about this stuff right now (hence the blog entry), which hopefully will quickly straighten a lot of things out.

 I have gotten a bit done…on the media compatibility front for example I’ve found at least one common codec in use today that I wan’t able to decode with my previously discussed MPlayer/ffmpeg or VLC combinations. The bad news is that there’s nothing I can do about it: the reason none of these can decode it is that I found there is no open-source decoder at all. The good news is that I am getting pretty convinced that MPlayer will be an easy-to-use tool that will “just work” for just about anything the open-source world can decode. I suspected this all along from personal usage, but wanted to test a bit more extensively and systematically for this project. For those interested, the undecodable codec was AMR, an audio (speech-optimized) codec developed for use on GSM and 3G cell networks. It’s relevant because some phones with cams stick it into their video streams…presumably because they have hardware that optimizes AMR encoding and that’s all they can handle when recording live video. Interestingly, if you feed it into Windows Media Player, it works just fine. Guess Micro$oft licenced it. I’d be curious to know how Facebook, which actively encourages cellphone videos to be uploaded to their video sharing service, got around this. Considering Wiki*’s different usage/audience, I don’t think I’ll continue to persue it, though.

 That’s all for now.

I’m surveying the decoding and rapid file identification capabilities of MPlayer, vlc, ffmpeg, etc to determine which are most worth utilizing and supporting. After the research and testing I’ve already conducted, I’m leaning towards using MPlayer as the decoder of choice. It has extensive codec support via its use of binary codec packages, has a nice (and usually very speedy) wrapper script that outputs parsable information suitable for verifying a file is decodable at upload time, and provides flexible encoding options (either through its own MEncoder or by outputting in YUV.)

Nevertheless, I still plan to make a framework that lends itself to adding multiple additional software suites. Here’s how I imagine it’ll work: the mediawiki administrator will be able to specify in a configuration file an ordered list of the software packages available, for example MPlayer,ffmpeg. Then, when a file is uploaded that appears to be a video, first MPlayer’s inspector will get a chance to analyze the file and declare it decodable. If it cannot, ffmpeg gets a chance to inspect the file (it looks like it would be easy to get this functionality from ffmpeg APIs if there isn’t some little proggy out there that does it already.) The first software suite that claims to be able to decode it will get the job, and this will be stored as part of the job’s info in the recoding queue. If nothing can decode it, the user gets an error and the upload is discarded.

To continue my testing now and as I develop the entire project, it would be helpful to have a diverse collection of video files to test with. The more I can see MPlayer fail myself, the better I can write scripts that detect it, so the more obscure the container format/codecs the better. I’m particularly interested in files made by digicams, cell-phones, and consumer encoding hardware. So, to my oh so many readers, if you have any of the above named devices, please send your masterpieces to me via anonymous ftp to mikeb.servehttp.com. Upload to the pub directory. Thank you for your help.

 Also, I’m excited about the increasing amount of news surfacing about efforts to get Theora files playing natively in various browsers, especially Firefox 3, and also the efforts to achieve this functionality in existing browsers by mapping the new HTML5 stuff to existing browser plugins with javascript. It’s awesome to see Wikimedia is smoothing out the playing of Theora files in-browser too, thanks gmaxwell!

I first set foot in my new rental house on Saturday, and am still in the process of unpacking, acquiring additional furniture, etc. Already I have been relieved of two digital flat panel LCDs from inside my own living room. The burglary happend at some point in the past few days, but probably yesterday. I hadn’t even unpacked them yet. The monitors taken included a Samsung Syncmaster 244t. How entry to the house was gained is unknown.