So it’s been a bit since I’ve posted…and since then, a lot has happened. Perhaps the most significant thing is my getting schooled pretty good on #mediawiki about the lack of merits of a pluggable mime detection system. Basically, its not that a pluggable media validation system is a bad idea, it’s that there’s no need to make any use of the existing MimeMagic module at all. My approach had been to start with it, and then use plugins to fine-tune its results as necessary. This is nice in that it doesn’t require a complete refactoring of uploading in MediaWiki, but it is overly complicated and has a few big drawbacks, one being that such a design wouldn’t really provide for extracting and caching file metadata, so that would have to be done at a later stage, requiring an in-depth analysis of the files twice.

 A better design is to just have upload validation handlers register themselves for file extensions that they can analyze, and have no generic mime detection at all. This is a part of Daniel’s proposed design expressed in his blog. Another nice not-yet mentioned detail of this design is that it provides for a clean way to taylor the recommended maximum upload size to different media types. The obvious drawback to this is that it requires the development of plugins specializing in reading & extracting metadata from every file type you want to support before this design could be deployed.

 Perhaps that wouldn’t actually be as much work as it sounds like. In the past 2 days, I’ve essentially created such a plugin that covers the entire audio and video arena, and I think it does a really good job, too. Currently, it can use the ffmpeg-php api, MPlayer’s little-known companion script midentify, or both. Adding additional analysis mechanisms would be a very straightforward process, but between those two you obtain pretty good overarching support for validating audio and video types. Actually, since MPlayer gives you ffmpeg’s abilities and then some, you can get along fine on midentify only, but I wrote support for ffmpeg-php for two reasons. The main one is that in my tests it sometimes finishes quicker, and sometimes by a lot. (Total runtimes for validation using a composite ffmpeg/mplayer solution are running between < .1 and ~.8 seconds on my test machine, depending on whether both end up needing to be invoked, so its in the ballpark.) Additionally, if MediaWiki moves to validating uploads using a number of dedicated plugins like this requiring some external utility or other, I can just hear the rumblings from private MediaWiki users. MPlayer is at least a truckload of RPM downloads, and at most a troublesome build/install, so for some that aren’t looking to do media recoding too, the php extension might make their installation experience easier.

 And that brings you to where I am at this moment. Currently I’m invoking this code through the existing UploadVerification hook, which operates with no regard to the uploaded extension/type. It’d be nice if my code only got called on uploads of extensions it had registered with the upload validator as audio or video (a long list, I know…and probably some work to properly compile) but for now, I think I’ll just emulate this by providing a list of extensions that it can verify as an array or sommat, and immediately return execution to the calling script if the upload isn’t on the list. Hopefully, at some point in the future, that can easily be adapted to register my code as the handler for those audio and video types.

And, if things start being done that way, it might also provide a good replacement for $wgFileExtensions, which otherwise is at risk of becoming tediously long as more file types get properly supported.

You can experiment with my code at — though this is the machine I’m working on, so it will be occasionally broken, have wierd debugging output, etc.