This transformed (and optionally quantized) data is then written in some form of code that is often specifically designed to be able to represent the transformed data in a small amount of space.
The above process is applied in reverse to recover a reconstruction of the input data (which for our algorithm can be lossless, as its components were chosen to allow for good performance in both cases).
Combining the two allows for a very nice property. A progressive embedded bit- stream can be truncated at any point, and still results in the "best" representation for that rate.
Then, for each block we estimate how much each bit-plane of the coefficients would improve the reconstructed image by. Also, we compute how many bits the compressed bit-plane (of the block, not the whole image) takes up.
This allows us to compute a good approximation of rate vs. distortion (R/D curve). We can thus estimate by how much each bit-plane (of a block) improves the image for each bit needed to encode it, and then order the data to be encoded in a way such that this improvement per bit decreases. This order information about from which block to take the next bit-plane is stored in what we call a "schedule" for each t_wv_channel.
All of the above happens in wv_channel_compact().
This is slightly complicated by the fact that we can have more than one channel in the same file (e.g. for a color image or an alpha channel). Now we need a way to decide from which schedule we should choose the next block (i.e. a schedule of schedules). The approach adopted here, which seems to work well in practice, is to code the next scheduled entry from the channel whose quality is currently the lowest, all the while taking their relative importance (which is deduced from their requested final qualities) into account.
This makes it possible to, for example, have chrominance channels in the YCoCg colorspace that at any point in the bit-stream have a peak signal to noise ratio that is 4dB lower than the error in the luminance channel. One can do this in one of three ways (although the example uses a difference of 4.0 mean-square error units, not dB): By giving their errors as [0.0 4.0 4.0] (absolute values), by using a relative mse value [-4.0 4.0 4.0] (can be read as 4.0 mse lower / less than the 2nd channel). Both of these examples terminate once the given values are reached. If this is not desired, one can use all relative / negative values like [-5.0 -1.0 -1.0] to give the same importance distribution, but now data will be written until the bit-budget is completely used up or all the blocks have been written.