Category Archives: tech

Cocoa Worker Thread

For Kompressor.app, I’ve written a group of Foundation classes for implementing a worker-thread paradigm in Objective-C. The general idea is you have some non-trivial amount of processing you want done and then (optionally) be notified when it’s done.

This implementation has a number of advantages compared to others:

  • Threads are created once and reused, no expensive thread creation for each work unit.
  • Non-polling (using NSConditionLock).
  • Low communication overhead (e.g. no Distributed Objects).
  • Small (although the code is spread out over 3 classes / files).

How do you use it?

  1. Create an instance of SIWorkManager (usually there should only be one); by default it creates as many worker threads as CPU cores are available.
  2. For each unit of work, create an SIWorkUnit object (which contains the target object and selector where the actual computation is done, as well as an optional didEndSelector which is called on the main-thread of your application with either the result of calling [target selector] or the originally given argument). The target and argument are retained by the work unit.
  3. Tell the work manager object about the work unit (via addWorkUnit:).
  4. Done!

If you want to wait for a batch of N work units to be done, create an NSConditionLock L, and at the end of the work selector, you do [L lock]; [L unlockWithCondition:[L condition] + 1];. After dispatching all the work units in the main thread (or wherever), you wait with [L lockWhenCondition:N]; [L unlockWithCondition:0];.

An (untested as I am only on a Linux machine at the moment) example demonstrating both simple usage as well as waiting for a batch to be finished is in BatchExample.m.

Remember to only access shared data structures (or your user-interface) in the didEndSelector. SIWorkManager also has messages for removing work on or referring to a given object. Use nil to remove all pending work units.

Here is the source code with all required files. Feedback / fixes appreciated.

Kompressor.app and Wavelet Image Compression Library 3.3.3

Shortly before the year is out (and as result of my vacation), there is some fresh software to be had… 🙂
I’ve now written an Mac OS 10.4 application called Kompressor.app to compress, inspect and display WKO images. This release goes hand in hand with version 3.3.3 of the wavelet image compression library itself.
Because this is the first (semi-)proper Mac-application I’ve written, I would welcome any form of testing or feedback people can provide, especially on the user-interface side. The application is universal and thus should work on PPC and Intel Macs.
Here’s a bit (all of it actually) of the supplied online help to get started… Continue reading

Low-Level Libraries and High-Level Languages

If you want to write a library that could relatively easily be used in embedded systems (say Xbox 360 / PS3) as well as being generally portable (Windows, Linux, Mac OS), then you end up in a bit of a bind. You can use C and be on the safe side, but for some types of programming C is rather cumbersome (see for example the Io language). You can use C++ (or one of its subsets), but now any client of the library has to understand the C++ ABI (which AFAIK only C++ does). Another option is to use whichever language is best suited to solving the problem you’re addressing in the library, but that usually imposes constraints on what can use or bridge to the library (e.g. if I were to write a library in Python, how do I use that from an application written in C++?).

Now, if whatever you’re coding is a low-level component, then this restriction becomes even more imposing because many higher-level pieces are now depending on and interacting with the library itself.
If you restrict yourself to running on Windows only (as that is the only platform where you can realistically rely on the CLR run-time being available) and don’t mind its overhead, then using .NET and the CLR will solve this problem for you nicely. For running on anything else, you’re pretty much screwed… 🙁

Rate-Distortion Graph

I’ve invested a bit of time in getting some nice rate-distortion graphs out of my wavelet image compression library. Now that it’s embedded, the process is fairly easy: Compress once into a single file and then decompress only enough bits from the file until the desired rate is achieved. As such, the graph also shows the quality over the progress of the file.

Here is an example of one such graph, for a 708×1024 image of Mena Suvari (which unfortunately comes from a JPEG source and thus has block noise), as the Lena image (notice the name similarity? ;)) is past its prime IMO, especially the color version has plenty of noise in the blue channel. The image was encoded in the YCoCg colour-space, which explains the slightly higher quality of the green channel and why even the highest rate is not lossless. Compared to the RGB version, the YCoCg representation incurs a mean-square error of ~0.25 (which results in the root mean-square error of 0.5 shown).

Rate / Distortion curve for a colour image of Mena Suvari

In theory, the rate/distortion graph should have a negative, but monotonically increasing derivative (which in normal language means something like “bigger improvements are closer to the beginning of the file”). We try achieve this by scheduling data units (which are essentially bitplanes of blocks) by how much they reduce the error in the coded image for each bit of their size. There are two reasons for this “close-but-not-quite-monotonically-increasing-slope”. One is that the error (distortion) is not computed exactly; instead I use an approximiation that based on the wavelet transform used. The other is the fact that we deal with packets / data units / blocks of finite size, and as soon as more than one channel is written to a single file, it becomes evident that a single packet will only improve a single channel – which in turn means the other channels have a stagnating improvement (zero derivative) for the size (or duration) of that packet.

These plots also make it easier to spot differences (usually either improvements or regressions) for changes in the code, so I’ve created graphs for all my test images, so that I can make these comparisons more easily in the future.

I’ve also taken an old pre-3.1 version (thank you, darcs!) that still used the recursive quantizer selection to produce similar plots (taking much longer as the image had to be compressed anew for each rate) and the result was pretty much a draw, in spite of the embedded version having to store a bit more sideband data (number of bitplanes for each block in the header, and which block is coded next in the bitstream itself). All of which makes the embedded version the “better choice”, due to its other advantages such as simpler code and “compress once, decompress at any rate”.

Patenting CPU instruction sets and $150 PCs

Via a blurb on ArsTechnica about a Chinese $150 PC, I’ve stumbled upon the Story of Lexra, a now out-of-business semiconductor IP provider. A few things in that article struck me as interesting, namely

  • You cannot patent an instruction set.
  • “You can patent designs and methods that are necessary to implement a particular unusual instruction that is part of the instruction set.
  • Guess what Intel has patents for MMX instructions on… Although this point is now moot, as AMD and Intel have cross-licensing agreements, it was an interesting way to stop AMD from offering “fully compatible” CPUs (which resulted in AMD offering 3Dnow).
  • “One interesting problem with the patent system is that one institution, the PTO, determines whether a patent is valid but a different institution, the courts, determine whether one infringes. It is entirely possible for the two institutions to interpret a patent differently and for either a non-infringer to be wrongly convicted or an infringer to get away with their crime.

Scary, isn’t it?

Size-optimising Code

I try to keep my image compression code fairly simple; it doesn’t do the best lossy compression, it doesn’t to the best lossless compression, but it does both fairly well, fairly fast, and without introducing much code bloat.

As a measure of this, I simply look at the size of the resulting executable for my simple test-application, wako. This can of course be very misleading, but it certainly is some measure of complexity, and I try to maintain its size below 40,000 bytes. But generally speaking, I want to do this by simplifying the code and making it more reusable, not by applying some complicated tricks that make the code harder to read but result in smaller a smaller binary.

So after fixing a problem in the implementation (using a fixed factor to convert from a floating point error-estimate of the mean-square error to a fixed point representation, which could then overflow), the code had grown above the “magic 40,000” mark. This was due to the fact that many channels can be interleaved into a single file, but to do that properly they each have to have comparable error estimates.

One obvious place to reduce the size, is the very basic command-line argument parser (which is not part of the library itself, but the sample application), which when commented out, reduces the size of the binary by 8kb! Rewriting this explicit parser to be data-/table-driven did not reduce the size as much as hoped, but it was enough.

39,849

As an aside, did you know the order of declarations within a structure has a non-negligible effect on code-size? 😉

Wavelet 3.2

As I’ve taken a two-day vacation pre-easter, I’ve gotten some more work done on my Wavelet Image Compression Library (and not played games as some of my colleagues were led to believe ;)).
Before this version all the subbands of an octave (i.e. HD, VD, and DD coefficients) were written as a single block in an interleaved manner. The changes I’ve made give each subband its own block, which increases the number of blocks, and thus allows for better and more fine-grained scheduling (which increases compression performance). Unfortunately, this led to a three times slower error estimation, which I then managed to cunningly avoid by computing the estimate for 3 blocks at a time.
Other changes include a more consistent behaviour for the -bpp command line switch, bug fixes to the scheduling where my accumulated fixed-point error estimates would overflow, and quite a few other things.
A experimental change was an exact error estimation, which I’ve then removed again as the code became unreadable, it was slow as hell, and didn’t actually help that much.
There is now a Darcs repository here (don’t try browsing there) from which you can get the current version and against which you can send me patches. You can also read my totally unfunny changelog entries if you need more information on the changes between versions.