Complexity, where does it come from?

People often wonder why even the simplest of things seem to take long to implement. Often this is accompanied by uttering the phrase made famous by Jeremy Clarkson: how hard can it be.

Well let’s find out. As an example let’s look into a very simple case of creating a shared library that grabs a screen shot from a video file. The problem description is simplicity itself: open the file with GStreamer, seek to a random location and grab the pixels from the buffer. All in all, ten lines of code, should take a few hours to implement including unit tests.

Right?

Well, no. The very first problem is selecting a proper screenshot location. It can’t be in the latter half of the video, for instance. The simple reason for this is that it may then contain spoilers and the mere task of displaying the image might ruin the video file for viewers. So let’s instead select some suitable point, like 2/7:ths of the way in the video clip.

But in order to do that you need to first determine the length of the clip. Fortunately GStreamer provides functionality for this. Less fortunately some codec/muxer/platform/whatever combinations do not implement it. So now we have the problem of trying to determine a proper clip location for a file whose duration we don’t know. In order to save time and effort let’s just grab the screen shot at ten seconds in these cases.

The question now becomes what happens if the clip is less than ten seconds long? Then GStreamer would (probably) seek to the end of the file and grab a screenshot there. Videos often end in black so this might lead to black thumbnails every now and then. Come to think of it, that 2/7:th location might accidentally land on a fade so it might be all black, too. What we need is an image analyzer that detects whether the chosen frame is “interesting” or not.

This rabbit hole goes down quite deep so let’s not go there and instead focus on the other part of the problem.

There are mutually incompatible versions of GStreamer currently in use: 0.10 and 1.0. These two can not be in the same process at the same time due interesting technical issues. No matter which we pick, some client application might be using the other one. So we can’t actually link against GStreamer but instead we need to factor this functionality out to a separate executable. We also need to change the system’s global security profile so that every app is allowed to execute this binary.

Having all this functionality we can just fork/exec the binary and wait for it to finish, right?

In theory yes, but multimedia codecs are tricky beasts, especially hardware accelerated ones on mobile platforms. They have a tendency to freeze at any time. So we need to write functionality that spawns the process, monitors its progress and then kills it if it is not making progress.

A question we have not asked is how does the helper process provide its output to the library? The simple solution is to write the image to a file in the file system. But the question then becomes where should it go? Different applications have different security policies and can access different parts of the file system, so we need a system state parser for that. Or we can do something fancier such as creating a socket pair connection between the library and the client executable and have the client push the results through that. Which means that process spawning just got more complicated and you need to define the serialization protocol for this ad-hoc network transfer.

I could go on but I think the point has been made abundantly clear.