Since it’s been so long, I thought I’d just let everyone know I haven’t abandonded my work, died, or anything of the like. Blogging just doesn’t get the code written, so I haven’t been spending as much time on it.

As it’s now a week into August, I can say with certainty how the queue will be implemented. I did end up going with the idea that surfaced while authoring my last post, to have a daemon implemented in php running on each recoding node. The primary means of interfacing with it is an accompanying script that must run on the webserver of a recode node (yes, they must run apache), taking input via POST, forwarding it as commands to the daemon, and returning the outcome in an easily machine-processable format. This is accomplished over named pipes, the simplest means of interprocess communication that works for this task. All requests to and responses from the daemon get prefixed with their length to ensure complete transmission of messages. Notification of new jobs, as well as cancellation of the currently running job is then achieved by simply contacting the node (selected with aid of the DB) over http. The daemon expediently provides the notify script with results of a command so that the entire process can occur at upload time, specifically as manifsted in an UploadComplete hook. (Cancellation of a current job is used if a re-upload occurs at the time that the job generated by the old file is running.) The daemon uses a persistent instance of MPlayer for all assigned jobs, via MPlayer’s slave mode.

Although this isn’t quite as firm yet, I expect the following database design changes will facilitate operations:

  1. The image table will grow a new ENUM indicating whether the recoded editions of a particular file are available, in queue, or failed. I.E., a file is marked available when all configured recoded versions of it are finished processing.
  2. A new table will serve as the queue, and will contain only job order and the file’s database key. (I did considerable performance testing here and concluded that both columns should be indexed…retrievals will be by both order and name, and individual inserts/deletes seem to keep constant time regardless of number of records. *Very* rarely the order could get reindexed just to keep the integers from running off to infinity, perhaps as a task through the existing job queue. This all only matters if the queue has significant length anyway, but this might be the case if a wiki decides it doesn’t like its existing resample quality or size and wants to redo them all.)
  3. A new table will contain the current status of each recode node, including address to the notify script and the current job if there is one.

One of the recode daemon’s first tasks upon starting a new job is establishing which recoded versions of a given file are missing. I just like this better than storing such job details in the queue, as it greatly simplifies the queue as well as the adding of new formats in future. As a result, the daemon does need to interact with FileRepo and friends, so a good MediaWiki install must exist on recode nodes as well. In fact, much of the MW environment is actually loaded in the daemon at all times.

I’ve still got plenty to keep me busy here as the time starts getting tight.

Advertisements