Compression Primer for PNG Review
Version 1.2
10 Mar 96
Updated: Minor corrections to email addresses and URLs
Version 1.2.1
24 Jul 99
Version 1.2.2
03 Aug 00
Vincent Sabio,
U.S. Army Research Laboratory (formerly)
Return Path, Inc. (currently)
Copyright 1996,1999,2000 Vincent Sabio

This document can be retrieved as follows:
via FTP:
via HTTP:
There is also an accompanying review of the Portable Network Graphics
(PNG) format:
via FTP:
via HTTP:

As promised, I have put together a "primer" on compression. Hopefully,
it will be clear to most of you. It delves into some moderately
technical areas, but I've tried to either keep the math simple or
eliminate it altogether (wherever possible).
Please note that this is intended to be a primer for the layman  it is
NOT a rigorous treatment of the technical aspects of compression!
Therefore, please do not send me messages like, "that assumption can
only be made if the noise process is statistically white, so the signal
values are diagonalized in the matrix decomposition, and all the off
diagonal values are identically zero." First of all, you'll be speaking
about a topic that isn't even discussed in this primer, plus you'll be
boring me. I have tried to present a very highlevel introduction to the
topic, and have hopefully geared it to the nontechnical reader.
"Okay, so what's the purpose of this thing?"
The purpose of this primer is to give you some background in compression
theory, without delving into all the information theory required to
prove the concepts. This background should enable you to comprehend some
of the more technical aspects of the forthcoming PNG review. It will
also allow me to present a straightforward review of the PNG specand
provide JPEG & GIF comparisonswithout having to digress into the
topics covered here (i.e., I can simply assume that all of this
information is known when I write the PNG review).
"What if I don't understand some of it?"
Don't panic. Most of the PNG review will probably be in nice, easyto
understand terminology. I've packed most of the really technical
information in here, which frees me to just discuss the specification
when I get around to writing the review.
"What if I don't understand ANY of it?"
Well, then you'll probably understand less than half of the PNG review
when it comes out. Not a cause for alarmif none of it makes any sense
to you, just press the DELETE key. :)
"I really don't care about all this technical stuffcan you just give
me a sneak preview?"
Sure. PNG is a remarkably welldeveloped spec that is targeted toward
the WWW (although other applications are certainly not out of the
question). It is a lossless compression scheme that incorporates gamma
correction, an optional alpha channel with fullrange transparency
(unlike GIF's on/off transparency), twodimensional interlacing (*very*
nice, BTW), some very good (and very expandable) filtering schemes, and
a publicdomain compression engine that has been around for a while (and
has thus proven itself). The scheme is designed to be flexible for
individual purposes, provides for a limited form of forward
compatibility (Very impressive, BTW. Years down the road, this will mean
that an application using an "older format" PNG engine can *possibly*
open and display a "newer format" PNG fileand will be able to evaluate
for itself, on the fly, whether it will be able to do so correctly), and
even provides for embedded text strings. How does that sound?
"Why are you so long winded?"
Beats me. It's certainly not because I've got the *time* to be long
winded ... it just happens that way.
"Okay, when can we get started on the compression primer?
Right now.

... I assume that what you are asking is, "can we design a compression
algorithm that can do some extra crunching and get the file extra small,
so it transfers faster?"
The answer is a resounding, unqualified "Well, yes and no  maybe.
Depends on the file (and the compression algorithm)." And herein lies
the key to file compression: not all data are created equal. Unlike
encryption, where one can simply process the heck out of the file, there
are real limiting factors in file compression. All this is neatly
bundled in some gradlevel courses on information theory, but the lay
version is thus (with apologies to the information theorists for the
grossly oversimplified treatment):
1. Information content:
No matter what, the resultant file must always contain enough
information to decompress it. The decompression algorithm also contains
informationand this will decrease the amount of information that must
be contained within the compressed file to expand it.
2. Correlatedness of data:
If the data in the file are highly "correlated," then it can
*potentially* be highly compressed. As it becomes less correlated, you
begin to lose compressibility. And the "correlatedness" of the data in a
file is partly a function of the compression algorithm being used.
Take an extreme example:
Suppose we have a compression program that uses what we'll call a
"contiguousstring" compression algorithmi.e., it looks for contiguous
strings of "1"s and "0"s.
Next, suppose we have a 100byte file containing all zeros; we'll call
this "Case A":
Case A: 0000000000000000 ...
Our contiguousstring compression algorithm can compress this file to a
single command instructing the expander to insert 100 zeros in place of
the command. Clearly, we have achieved very high compression in this
example.
Now, if we insert a "1" somewhere around the middle of the file, then we
have to break the single command into three commandsone that states
"insert 42 zeros" (e.g.), one that states "insert a one," and finally
one that says "insert 57 zeros"where 42 + 57 + 1 = 100. Inserting the
"1" has decorrelated the data (however slightly), and results in a
larger compressed file.
Continuing in this manner (i.e., adding 1's to the file) will generally
continue to increase the resultant compressedfile size, until we reach
the worstcase situationwhen the file to be compressed contains only
alternating 1s and 0s; we'll call this "Case B":
Case B: 1010101010101010 ...
Now, using our contiguousstring algorithm, the compressed file will be
larger than the original. (Insert "!" here.)
Suppose, instead, that we select an alternate compression algorithmone
that looks for *alternating* 1s and 0s. Now, this "alternating string"
algorithm will compress the "Case B" file to a single command that
instructs the expander to insert 50 occurrences of "10" (not "ten," but
"onezero"). We have once again achieved very high compressionin a
situation where the previous compression algorithm resulted in a larger
file than the original. But if we were to apply the alternatingstring
compression algorithm to the "Case A" file, it would perform quite
poorly (in fact, the "compressed" file would be larger than the
originalthe same result we had for the "Case B" file with the
contiguousstring compressor).
Basically, we have seen a simple example of why different compression
algorithms are best suited to specific data setsi.e., "correlatedness"
is defined in terms of the data that is in the file AND the algorithm
being used to compress the data.
3. Linear decompositions and lossy compression:
A "lossy" compression schemesuch as JPEGattempts to trade image
quality for disk space (or communications channel bandwidth). One
approach is to try to determine, in a sense, what are the "major"
contributors to the image and what are the "minor" contributors. Then
the minor contributors can presumably be removed without substantially
affecting the quality of the image.
Well, that sounds nice, but how do we determine what qualifies as a
"major contributor" or a "minor contributor"? This gets a little
technical if you're not familiar with Fourier series and power spectral
densities, but I'll give it a try ...
We're all familiar with sound, and what is a highfrequency sound (in
lay terms, a "highpitched" sound) or a low frequency ("lowpitched")
sound. Music is generally a compilation of many different frequencies
over the range of human hearing (20Hz to 20KHz). A graphic equalizer
sorts these frequency components into different frequency "bins" where
each bin represents a range of frequencies (say, 20100Hz, 100500Hz,
5002500Hz, 250010000Hz, 10KHz20KHz). In signal processing, we refer
to the "full range"in this case, 20Hz to 20KHzas a "band" or a "full
band," and the divisions are referred to as "subbands." Now, if we had a
spectrum analyzer (this is becoming one really nice stereo system ...),
we could look at the display and see the amount of "power" (being
delivered to the speakers) within each subband. Suppose, as the music
plays, we notice there is a specific subband that stays at a very low
power level throughout the song. Theoretically, we could eliminate this
subband from the music altogether, without suffering any substantial
degradation in the sound qualitypossibly, the difference would not
even be noticeable. (There are audiophiles wincing right now.)
This is one of the methods used by compressors such as JPEG. A Fourier
transform (on which JPEG is based) (for those of you who are about to
say, "No, it's based on a DCT!", note that the discrete cosine transform
is a modification of the Fourier transform) [start over] A Fourier
transform is a "mathematical" spectrum analyzer, and splits the signal
in this case, an imageinto many different subbands. Now, the operator
(that's you) is generally asked how much quality he's willing to
sacrifice for compression, and this sets an adaptive threshold in the
compression algorithmany subbands with more power than the threshold
are contributing substantial power to the image, and are kept. Anything
below the threshold is eliminated.
There are several shortcomings of the Fourier transform (FT) family (to
include the DCT) where their application to compression is concerned;
we'll briefly discuss two of them here. Note that these shortcomings are
inherent in JPEG compression, which relies upon these transforms. Note,
also, that I am referring to the shorttime Fourier transform (STFT) and
its derivatives; the first shortcoming does not hold for the infinite
Fourier transformbut then, I've never seen an infinitely large image,
either. :) And, finally, a technical note: the thing that we generally
refer to as the Fourier transform (FT) is really the Fourier series
(FS); I will use the more common "Fourier transform" nomenclature.
i. The STFT and DCT are not orthogonal decompositions, even though many
people (including many engineers!) believe that they are. So, what does
this mean to you? Well, "orthogonal" = "lossless"so, even the
"lossless JPEG" isn't truly lossless! It will, in fact, introduce
artifacts into the image. The severity of the artifacts is generally
minimal, and is a function of the image bandwidth (i.e., the size of the
"full band"). For those of you who are familiar with the FT and the FFT,
note that the FFT is a timelocalized FT, where the localization has
been accomplished by square windowing the FT basis functions. This
windowing process leads to the introduction of Gibbsphenomenon noise
(recall that a finite number of sine waves cannot accurately reproduce a
square wave). In any event, this is really a technicalitythe noise
introduced is typically very very small, and can be below the threshold
at which it can even begin toggling bitsso "lossless JPEG" can still
be considered lossless, for all intents and purposes.
ii. The FT and derivative transforms are not "constant Q" across
subbandsi.e., the ratio of their bandwidths to their center
frequencies is not constant, but it changes from subband to subband. In
particular, the *bandwidth* of each subband is fixed across all subbands
for a particular transform. The problem this introduces is one of
measuring the power spectral density (PSD) accurately for the purposes
of compressionnamely, the higher subbands (smaller Q) are relatively
"narrow," and will thus have artificially smaller PSD values; vice versa
for the lower subbands. When analyzing the PSD for compression purposes,
it's relatively easy to impose a nonlinear thresholdweighting
function, but this is still not ideal.
Note that both shortcomings discussed can be overcome with the wavelet
transform. Just figured I'd mention that; I won't digress any further by
launching into a discussion of the WT.
4. Dynamicrange:
"Dynamic range" is commonly referred to as "bit depth." Simply put, a
greyscale file in which each pixel is represented by 24 bits has
substantially greater dynamic range than a greyscale file in which each
pixel is represented by 8 bits.
4a. Excess dynamic range
Suppose we have pixels that are represented as 3byte unsigned
integer datai.e., 24 bits per pixel. And let's say that the image
pixels only "use" 16 out of each of those 24 bits (for now, we'll take
the simplest case, and assume that the highorder 8 bits are always
zero). Then, we can eliminate 8 bits from each 24bit pixelresulting
in an immediate compression of 33%.
4b. Frequencydomain filtering
Now, suppose the PSD of the image pixels looks like this (view with a
monospaced font, like Courier):

 **
 * *
 * *
 * *
 * *
 * *
 * *
 * *
 * *

f1 f1

2
... where f1 is the bandwidth of the image, and f1/2 is the halfband point.
Given such a distribution (which is not uncommon in imagery, among other
applications), we can halfband filter the image data, effectively
splitting the information into its highband and lowband components.
Note, now, that the upper half of the band has much lower overall power
and thus requires less dynamic range to represent itthan the lower
half. So, we can compress the data in the upper half of the band into
smaller words (smaller, that is, than the word size required to
represent the fullband data), in a manner similar to that discussed in
4a. This can also result in substantial savings, with no loss of
information content.
4c. Timedomain filtering
(Most of the claims made in this section should be preceded by the
caveat, "Statistically speaking ....")
Because any given pixel is often well correlated with its neighbors, we
can use the neighboring pixel values to "predict" the value of the pixel
in question. If we simply subtract the predicted value from the actual
value, we decorrelate the pixel from its neighbors. But because the
correlation was initially quite high, the "residue" is generally pretty
small. For example, suppose the pixel values in a small neighborhood of
8bit greyscale data are as follows:
100 120 110
130 105 100
120 100 130
Suppose, now, that we wish to "estimate" the center pixel by simply
averaging the surrounding pixels and rounding to the nearest integer:
predicted = average(100+120+110+130+100+120+100+130) = 114
Subtracting the predicted value from the actual value:
filtered = 114  105 = 9
This is just one example, but suppose that the data in the image is
correlated enough that this scheme yields a maximum filtered pixel value
of (say) 31. Then every filtered pixel in the image can be represented
by 5 bits (5 = log2 (32)) instead of 8 bitsagain, a substantial savings.
If our assumption of correlated data holds over the entire image, we can
reduce the dynamic range of the majority of the image (say, all but the
first scan line in the image, or all but the first and last lines, or all but the
outermost pixels, etc.) by this technique.

This has been a very brief discussion of some of the basic concepts
involved in compression. Certainly, those of you who are very familiar
with compression would have a lot to add to thisand a lot of
qualifications to makethat have been left out of the text in the
course of this treatment. My goal here was to get the concepts across,
not provide a rigorous treatment of the subject matterwhich would
surely have lost the majority of the audience.
For the rest of you, I hope this was reasonably clear. I will follow up
sometime soon with the review of the PNG spec*. Meanwhile, if you have
any questions about the content of this primer, please feel free to drop
me a note.
And, if you would, please let me know if this was useful and (or?)
informative to you, or if it was too technical (or too elementary), etc.
Thanks!
 Vince Sabio
vincepng@vjs.org
*UPDATE 24 Jul 99: The PNG Review can be found here:
via FTP:
via HTTP: