Review of the Portable Network Graphics (PNG) Specification
Version 1.2
29 Apr 96

Updated: Minor corrections to e-mail addresses and URLs
Version 1.2.1
24 Jul 99

Version 1.2.2
03 Aug 00

Vincent Sabio,
U.S. Army Research Laboratory (formerly)
Return Path, Inc. (currently)
Copyright 1996,1999,2000 Vincent Sabio

--------------------------------------------------------
This document can be retrieved as follows:

 via FTP: <ftp://ftp.smartbounce.com/pub/png/png-review.txt>
 via HTTP: <http://www.smartbounce.com/png/png-review.txt>

There is also an accompanying compression primer for those of you who
need a quick, layman's review of data compression:

 via FTP: <ftp://ftp.smartbounce.com/pub/png/compression-primer.txt>
 via HTTP: <http://www.smartbounce.com/png/compression-primer.txt>
--------------------------------------------------------

PORTABLE NETWORK GRAPHICS FORMAT -- The Long-Awaited Review

Welcome to PNG -- the "Portable Network Graphics" format. From the looks 
of it, we'll be speaking of GIF in the past tense rather soon.

(Portions of this discussion are excerpted from the PNG (Portable 
Network Graphics) specification, tenth draft, 5 May 1995.)

NOTE: This review is intended for a non-technical audience; it is not 
intended to be a rigorous, technical treatment of the subject matter. 
This document is a summary and description of the PNG spec, and is in no 
way intended as a replacement or substitute for the complete 
specification. 

For more information on PNG, refer to these URLs:

PNG spec: <http://www.w3.org/TR/REC-png>

Greg Roelofs's PNG Home Page (including some very nice descriptions and 
demonstrations of 2-D interlacing): <http://www.cdrom.com/pub/png/>

(I'd also like to thank Greg for the invaluable review, suggestions, 
commentary, and topical knowledge he provided on the first draft of this 
document.)

To start off, PNG sports several features that make it a very attractive 
format for 'Net graphics; to wit (from my earlier compression primer):

"It is a lossless compression scheme that incorporates gamma correction, 
an optional alpha channel with full-range transparency (unlike GIF's 
on/off transparency), two-dimensional  interlacing (*very* nice, BTW), 
some very good (and very expandable) filtering schemes, and a public-
domain compression algorithm that has been around for a while (and has 
thus proven itself). The scheme is designed to be flexible for 
individual purposes, provides for a limited form of forward 
compatibility (Very impressive, BTW. Years down the road, this will mean 
that an application using an "older format"  PNG engine can *possibly* 
open and display a "newer format" PNG file -- and will be able to 
evaluate for itself, on the fly, whether it will be able to do so 
correctly), and even provides for embedded text strings."

It is worth mentioning at this point that I had (and continue to have) 
nothing to do with the development of the PNG specification, nor am I 
likely to be associated with any future development of the spec. I am an 
electrical engineer (BSEE/MSEE Univ. of Maryland) working for the U.S. 
Army Research Laboratory in Adelphi, MD, and my area of specialization 
happens to be signal processing -- thus, I have a great personal 
interest in things like this. I have tried to keep the summary of the 
spec as objective as possible, but my own impressions (and comments) are 
littered throughout; I have identified them as such whenever they have 
worked their way into the discussion.

While I'm on the subject of signal processing, it's also worth noting 
that some of the terminology used in this review might be different from 
what is used by most graphic artists. Specifically, what graphic artists 
refer to as "aliasing," we signal processors refer to as "spatial 
quantization noise." To us, "aliasing" refers to the low-frequency 
artifacts produced by digitally sampling a signal (an image is just a 
type of signal) at a sampling rate below the minimum rate as specified 
by Shannon's Sampling Theorem (typically, this threshold is twice the 
highest frequency component in the signal). These low-frequency 
artifacts can be seen in high-resolution images that are scanned at low 
resolutions; graphic artists typically refer to these artifacts as 
"moire patterns."

Offhand, those are the only terms I can think of that are used 
differently between the graphic-arts and signal-processing worlds. Some 
more terms that will be used throughout the discussion, and thus are 
best defined at the outset, are as follows:

decoder: any application that can read (input) and interpret a PNG file 
for display or manipulation.

encoder: any application that can write (output) an image in the PNG 
format -- that is, a format consistent with the requirements of the PNG 
specification.

scan line ("scanline"): a "line" (usually a row) of image data. In an 
interlaced file, the entire image is divided into several sub-images; in 
this case, a scanline is defined as a line in the sub-image. (See 
Section 1.6.1 for a more detailed explanation of interlaced scanlines.)

max_bitdepth: the maximum value that can be represented by a single 
pixel's data word; if the pixel contains (bitdepth) bits, then the 
max_bitdepth value is {(2^bitdepth)-1}. So, for example, an 8-bit image 
has a max_bitdepth value of (2^8)-1 = 255.

Okay, let's go through the PNG spec, section by section. I will leave 
out the boring parts (hopefully), and will focus on those areas that 
should be of interest to Photoshoppers (and graphic artists, in 
general). I will proceed through the spec in a non-linear fashion, 
presenting information in an order that supports the discussion as 
efficiently as possible -- thus, my section numbers do not parallel the 
PNG spec's.

SECTION I: Summary
----------------------------

1.1: File Structure

A PNG file consists of a "signature" followed by a series of "chunks."

1.1.1: PNG Signature:

The first eight bytes of a PNG file comprise the PNG signature; they are 
fixed, and identify the file as a single PNG image. There is no version 
information contained in the signature; this information can be 
contained in one or more of the ensuing chunks. The signature is thus 
invariant, and (for your reference) is composed of the following decimal 
values:

137  80  78  71  13  10  26  10

Every PNG file must start with these eight bytes.

1.1.2: Chunks:

A "chunk" is a well-defined section of data; the data can be image pixel 
values, text, gamma information, an alpha channel, etc. Virtually all 
information in a PNG file is stored in one or more chunks; the typical 
PNG file will be composed of several chunks.

Each chunk has a four-character name consisting of upper- and lower-case 
ASCII letters. Each chunk ends with a "cyclical redundancy check" (CRC) 
calculated on the preceding bytes in the chunk; the CRC verifies the 
integrity of the data in the chunk.

The purpose of the chunk name is to identify the chunk and provide 
information about the chunk to the decoder. Judging strictly by the name 
of the chunk (assuming that it follows the PNG chunk-naming convention, 
discussed *very* briefly below), a decoder can tell if it can decode the 
PNG file or not. (An example of a case in which a decoder might not be 
able to decode the file would be if the file is of a format developed 
*after* the decoder was written, and contains data that cannot be 
interpreted by the older decoder. A current parallel is the new P-JPEG 
(progressive JPEG) format: older decoders that do not recognize this 
format would know that they cannot successfully decode the file.)

Since I will be referring to some common chunk types in the course of 
this discussion, it is worth briefly mentioning the chunk-naming 
convention. Recall that chunk names are four characters; each character 
has a different meaning, and that meaning is further specified by the 
"case" (uppercase or lowercase) of the character:

First character:
  uppercase := the chunk is critical to the display of the file's 
contents ("critical" chunk)
  lowercase := the chunk is not strictly necessary in order to 
meaningfully display the file's contents ("ancillary" chunk)

Second character:
  uppercase := the chunk is part of the public specification ("public" 
chunk)
  lowercase := the chunk is not a part of the formal PNG spec 
("proprietary" chunk)

Third character: (Uppercase always) Reserved for future use; currently, 
all chunk names must have uppercase third characters.

Fourth character: (of interest to PNG editors, not necessarily PNG 
decoders)
  uppercase := this chunk has been denoted as "unsafe to copy" under 
certain circumstances ("unsafe-to-copy" chunk)
  lowercase := this chunk is safe to copy ("safe-to-copy" chunk)

Since a PNG file consists solely of a signature followed by a series of 
chunks, the image data, itself, is contained in one or more chunks -- 
these image data chunks use the chunk name "IDAT," and are thus referred 
to (in the spec, and surely in other PNG-related text as well) as "IDAT 
chunks."

The PNG spec supports future expansion through the creation and public 
registration of new "public" chunk types. This means that a type of 
data, character set, etc., that is not supported in the current revision 
of the spec can be incorporated into future revisions via the creation 
and registration of new chunks to define and represent those structures. 
And that's just the "public" route; if there isn't sufficient demand to 
justify public registration of a new chunk, PNG still allows for the 
creation of *private* chunks -- that is, chunks that are intended to be 
recognized only by custom decoders designed to support them.

1.2: Pixel Structure

1.2.1: Data Representation

All data words composed of more than a single byte will be in "network 
byte order"; the upshot of this is that Macs, Suns, VAXen, IBM 
mainframes, DECstations, PCs, etc., will be able to exchange PNG files 
directly, without reordering bytes (within words) for each platform's 
preferred byte orientation.

Three types of pixels are supported: 

1. Palette-mapped pixels are represented by a single value that is an 
index into a supplied palette. The bit depth determines the maximum 
number of palette entries, not the color precision within the palette.
 
2. Greyscale pixels are represented by a single value that is a 
greyscale level, where zero is black and the largest value for the bit 
depth is white. 

3. True-color pixels are represented by three-value sequences: red (zero 
= black, max = red) appears first, then green (zero = black, max = 
green), then blue (zero = black, max = blue). The bit depth specifies 
the size of each value, not the total pixel size. 

1.2.2: Pixel Dimensions

Non-square pixels can be represented, but the PNG spec does *not* 
require viewers to account for them -- i.e., a viewer may present any 
PNG image as though its pixels are square. However, the spec *does* 
strongly encourage viewers running on display hardware with non-square 
pixels to rescale images for proper display.

1.3: Alpha Channels

Optionally, greyscale and true-color pixels can also include an alpha 
value. An alpha value of zero represents a fully-transparent pixel; a 
max_bitdepth value represents a fully-opaque pixel. Intermediate values 
represent partially-transparent pixels. In toto, the alpha "channel" 
(i.e., the grouping of all the alpha values) can yield an image whose 
pixels range from fully opaque to fully transparent, which can then be 
overlaid on a background image to form a "composite" image.

The actual color or greyscale pixel value, as stored in the PNG file, is 
not affected by the alpha-channel values. The values of the displayed 
pixels, including "transparency" effects, are calculated by the decoder. 
(This is referred to as "unassociated" or "non-premultiplied" alpha.) 
This requires that the file contain the full alpha-channel information; 
the alternative is to premultiply the pixels by the alpha fraction, but 
this results in an irretrievable loss of information (and a reduction in 
the level of creative control wielded by the artist). The selection of 
non-premultiplied alpha is, in this author's opinion, well worth the 
extra storage (and resultant bandwidth).

Transparency control is possible without the storage cost of a full 
alpha channel, although the level of control is substantially reduced: 
in a palette image, an alpha value may be defined for each palette 
entry; in greyscale and true-color images, a single pixel value may be 
identified as being "transparent." Information for these transparency 
techniques is stored in a separate chunk.

1.3.1: Alpha-Channel Caveats

1.3.1.1. Alpha channels may not be included with images that have fewer 
than 8 bits per sample.

1.3.1.2. Decoders are not required to support transparency control! This 
means that your cleverly-composited image (on, say, one of your Web 
pages) does not have to be displayed by the decoder as a composite image 
-- in this case, the decoder will most likely display the second 
("topmost") image directly over the first ("bottom") in a fully-opaque 
fashion. I would expect that the lion's share of even marginally-decent 
decoders will support full transparency control, but this is only an 
opinion.

1.3.1.3. PNG requires the image to be rectangular in area. Non-
rectangular images are filled out (rectangularized?) by adding "fully 
transparent" pixels. To enhance compression, encoders are suggested to 
use the same color value for all fully-transparent pixels in the filled-
out area. From the paragraph above, however, recall that decoders are 
not required to support transparency control -- thus, the PNG 
specification recommends to encoders that "the colors assigned to 
transparent pixels should be reasonable background colors whenever 
feasible."

If the image already has a known background color, the PNG encoder can 
write this information into the bKGD chunk; thus, even decoders that 
ignore transparency can fill the unused screen area with the bKGD color.

1.4: Gamma Correction

Gamma is a way of defining the brightness reproduction curve of a camera 
or display device. Failure to correct for image gamma can lead to a too-
dark or too-light display; thus, gamma correction is very desirable 
whenever possible.

If you are a math-o-phobe, you should skip to the next section now; I've 
tried to keep the gamma discussion straightforward, but it might get 
just a little hairy for some of you (sorry). If you're still with us, 
but only tentatively, then you might benefit from noting that the "^" 
character means "raised to the [power of]..."

When brightness levels are expressed as fractions in the range 0 to 1, a 
device produces an output brightness level "obright" from an input 
brightness level "ibright" according to the equation:

obright = ibright ^ gamma

An overall gamma of 1.0 gives correct tone reproduction; thus, a decoder 
should use the gamma values of both the display device and the image 
file (when they are available) to accurately reproduce the image on the 
target device.

Reversing the above equation, we can determine the input brightness 
level from the output brightness level and the gamma value:

ibright = obright ^ (1.0 / gamma)

Thus, if the gamma values for the image file (call it "file_gamma") and 
the display device (call it "display_gamma") are known, we can calculate 
the corrected gamma value of a "normalized" pixel (call it "norm_pix") 
as:

corr_gamma = norm_pix ^ (1.0 / (file_gamma * display_gamma))

The "norm_pix" value must be pre-normalized (as a floating-point value 
in the range 0.0 to 1.0) from the pixel value (call it "pixval") in the 
image file by the following equation:

norm_pix = pixval / max_bitdepth

Defining a new value (constant for each PNG file), "corr_factor," as

corr_factor = 1.0 / (file_gamma * display_gamma)

yields the following calculation for each gamma-corrected pixel:

corr_gamma = (pixval / max_bitdepth) ^ corr_factor

(There is one more step required to calculate the final pixel value for 
the display unit's frame buffer, but that equation -- though quite 
simple -- is not necessary to complete this discussion.)

Since the type of calculation required to compute corr_gamma for each 
pixel can require a great deal of computing power (and, thus, computing 
*time*), the decoder can simply calculate the corrected pixel values 
*once* to from a lookup table, and then use the "pixval" value as an 
index into the table. This process can be optimized still further to 
yield substantial reductions in processing (albeit at the cost of a 
minor amount of "scratch pad" storage to create the table).

PNG expects encoders to record the gamma (if known), and it expects 
decoders to correct the image gamma if necessary for proper display on 
their display hardware. If no information on the display device is 
available, an assumed value of 2.2 is recommended by the PNG spec. This 
represents a contrast boost, since real CRTs are generally around 2.8; 
apparently, this is a common practice in T.V. and film. To preserve this 
contrast boost, the PNG spec suggests that the display gamma be divided 
by 1.25, though this is not a requirement for PNG compliance.

Gamma correction is not applied to the alpha channel, if any. Alpha 
values always represent a linear fraction of full opacity. This is a 
good method, since pre-computing the gamma correction for the alpha 
channel would result in an irretrievable loss of information -- 
information that could not be recovered at the decoder.

Finally, the PNG spec specifically states that encoders should *not* 
pre-correct gamma in the image. Similar to the alpha-channel case, gamma 
correction is an inherently lossy operation; thus, pre-correcting image 
gamma would not be a good idea. Also, it is this author's opinion that 
the computational price paid for gamma correction in the decoder is 
small enough to easily justify non-pre-corrected gamma.

Section 1.5: Text Strings

PNG supports the capability of storing full text strings along with the 
image; the text strings are stored in text "chunks" (see Section 1.1). 
The character set selected for (and, thus, supported by) PNG is ISO 
8859-1 (Latin-1), which represents a "widely useful and reasonably 
portable character set," according to the PNG spec. Since PNG *does* 
provide for the creation and public registration of new ancillary 
chunks, it is both permissible and quite probable that other character 
sets will be supported in the future.

Note that this provides a very flexible utility for storing alpha-
numeric information along with the image. For example, the medical 
community can store patient information with PNG-compressed X-rays; 
similarly, key radar parameters and collection data can be stored with 
its associated SAR imagery. It is not necessary to create new chunk 
types for specific types of text data; instead the data can be stored in 
the already-defined tEXt chunk, along with reasonably self-explanatory 
keywords. (To this end, the PNG spec recommends using keywords that are 
fully spelled out, instead of abbreviations.)

Section 1.6: Image Processing

Although image processing ("signal" processing, really) lies at the 
heart of the PNG spec (or any compression specification), I will not 
delve too deeply into the topic area because of its mathematical nature. 
Plus, many of the topics to be discussed below were already covered in 
the compression primer distributed earlier (and also available in this 
directory).

If you do not have the compression primer, it can be found here:

 via FTP: <ftp://ftp.smartbounce.com/pub/png/compression-primer.txt>
 via HTTP: <http://www.smartbounce.com/png/compression-primer.txt>

I will discuss data interlacing first, and then go on to discuss PNG 
filtering and compression. This is somewhat out of order (from an 
intuitive perspective), but the interlacing scheme factors into the 
filtering and compression sections, so it needs to be discussed first.

1.6.1: Interlacing

"Interlacing" is a means of storing (and re-generating/displaying) an 
image such that, when decoded, it is initially displayed as a low-
quality (spatially downsampled) version of the original, and finer 
details are "filled in" in progressive steps until the entire, fully-
detailed image is displayed. Images of this nature are also referred to 
as "progressive" images, since the image seems to progress from a low-
quality to a high-quality image as the load completes. ("Interlacing" 
actually refers to the format in which the image data is stored in the 
file.)

There is a subtle -- but important -- distinction that needs to be made 
where interlaced/progressive images are concerned: Specifically, image 
loads (for the *full* image) are *not* faster in the "interlaced" case 
than they are in the "straight-laced" :-) case. In fact, the (very 
slight) overhead associated with loading and displaying an interlaced 
image will, if anything, result in a slightly *longer* load time than in 
a similar, non-interlaced image. The advantage of interlacing lies in 
the time required to get an initial, low-resolution image onto the 
screen; while non-interlaced images are drawn in a linear, top-to-bottom 
fashion (and can thus require a substantial amount of time to draw), 
interlaced images present a downsampled version of the entire image 
rather quickly -- so the human operator can begin to view the image 
almost immediately -- and then fill in the image in a progressive 
fashion as the rest of the data loads. Thus, there is no savings -- 
either in time or bandwidth -- from the start of the image load to the 
completion of the image load in interlaced vs. non-interlaced imagery 
(in fact, there is a slight penalty in the progressive-display case). 
Interlacing is performed entirely and exclusively for the benefit of the 
human operator.

Typically, images are interlaced in one dimension (e.g., GIF); i.e., 
whole scanlines are reordered, yielding a "venetian blind" effect when 
the image is displayed. In the PNG case, however, the interlacing is 
performed in *both* dimensions, creating a sense of the image "fading 
in" (rather than "wiping down"). To some degree, the distinction is 
purely aesthetic; however, from the perspective of the purpose of 
interlacing -- providing a meaningful display for the user as quickly as 
possible -- the goal is achieved somewhat faster. Specifically, the 
"excess" bandwidth required to transmit a full-band scanline (per the 
GIF approach) is used more efficiently by PNG to transmit *several* 
partial scanlines. Depending on your age, the effect is either more 
pleasing to eye, or just a whole lot cooler.

In the PNG approach, the entire image is displayed in seven distinct 
steps, or "passes." (The format is referred to as "Adam7.") Note that 
while PNG decoders are required to be able to read interlaced images, 
they are not required to actually perform progressive display. Also, it 
is worth mentioning that interlacing slightly expands the file size (on 
the average).

1.6.2: Filtering

The image is filtered before it is compressed. (Recall from the 
compression primer that certain spatial filters permit improved 
compression performance.) Note that the image is effectively interlaced 
before it is filtered; thus, we have the following flow for these steps:

(image) --> interlace --> filter --> compress

This permits the most efficient data flow (for these three steps) 
through the decoder (i.e., the decoder does not have to wait for the 
full image to load before it can begin decoding interlaced images).

There are currently four filter options -- plus "None," which means that 
the scanline in question is not filtered -- although there is the 
capability for up to 255 individual filters,
plus "None," within each set of filters. In addition to that, the 
presence of the overall filter type byte (similar to the compression-
type byte) means that there can be up to 256 *sets* of 256 filters.

The following filters are currently supported:

Code	Name
0    None
1    Sub
2    Up
3    Average
4    Paeth	

Brief descriptions of these filters are as follows:

0 (None): Self explanatory.

1 (Sub): Compute the difference between the current byte and the value 
of the corresponding byte of the preceding pixel.

2 (Up): Compute the difference between the current byte and the value of 
the corresponding byte of the pixel directly above the current pixel.

3 (Average): Compute the difference between the current byte and the 
average of the corresponding bytes in the pixels directly above and 
directly to the left of the current pixel.

4 (Paeth): Compute the difference between the current byte and a 
predictor computed from the values of the bytes directly above, directly 
to the left, and directly to the above-left corner of the current pixel.

For palette-based color and for images with bitdepths less than 8, 
filter type 0 ("None") is recommended in the spec.

An important point where PNG is concerned is that the filters are 
(optionally) applied on a scanline-by-scanline basis; thus, the encoder 
can select the most appropriate filter for each scanline, depending on 
the data present. One effective (but inefficient) way to do this is to 
run *each* filter on *each* scanline, and select -- for each scanline -- 
the filter that performs the best on that line. Unfortunately, this is a 
very computationally-intensive approach, especially for large images, so 
more efficient techniques will most likely be employed in the filter-
selection process.

Finally, note that filter optimization is optional; if an encoder so 
desires, it can default to any one of the filters for every scanline, or 
employ no filtering at all. I think it's unlikely that even the laziest 
programmers would employ a "no filters at all" option, since 
implementation of any one of the filters shown above is pretty simple 
(even Paeth), and will most likely yield better performance (than no 
filter at all) in the majority of cases.

1.6.3: Compression

PNG provides the capability to support multiple compression formats, 
although "deflate" compression (referenced in the PNG spec as "PNG 
compression type 0") is the only method currently defined for PNG. The 
upshot of the provision to support multiple formats is that better 
compression engines -- e.g., wavelet-based methods (albeit lossy), once 
they make their way from the blackboards to the commercial applications 
-- can be incorporated with relative ease. Although it requires updating 
the encoders and decoders (a straightforward process -- we're all used 
to DLing the latest upgrades of various apps and plug-ins), it provides 
the capability to continuously update the specification as better 
algorithms become available. In short, PNG is a spec that is designed 
for long-range supportability.

The compressed data in the PNG format are referred to as a "zlib" 
datastream; a single zlib datastream comprises one or more IDAT chunks, 
or a single zTXt chunk (i.e., user-readable text can also be 
compressed). Zlib datastreams in IDAT chunks decompress to filtered data 
as described above; datastreams in zTXt chunks decompress to readable 
text
(assuming, of course, that they *contain* readable text).

Since PNG supports "deflate" compression, I'll give a brief overview of 
the method.

Deflate compression is an LZ77 derivative used in zip, gzip, pkzip, etc. 
(It is referred to as "LZ77" since it was first published by Ziv and 
Lempel in 1977 ("IEEE Transactions on Information Theory," vol. 23, no. 
3)) According to the spec, "Extensive research has been done supporting 
its patent-free status. Portable C implementations are freely 
available." This is good news for us, but it's *really* good news for 
companies like Adobe, which certainly would prefer to avoid the types of 
hassles created by the GIF fiasco.

"Deflate" uses a combination of LZ77 and Huffman encoding. Since very 
few (most likely none) of you are likely to be interested in the guts of 
either LZ77 or Huffman coding (or Huffman trees, which are also employed 
in the algorithm), I will not discuss them here; however, the deflate 
specification is currently at draft 1.3, and is available from:

<ftp://ftp.uu.net/pub/archiving/zip/doc/deflate-1.3.doc>

UPDATE 24 Jul 99: The deflate document is no longer available, though the
/doc directory contains a wealth of information on the Zip format. </vs>

Note that there is now a zlib home page, from which you can find the 
most current version of the specification:

<http://www.cdrom.com/pub/infozip/zlib/>

The deflate algorithm has the following properties:

* It is independent of CPU type, operating system, file system, and 
character set.

* It compresses data with an efficiency comparable to the best 
currently-available general-purpose compression methods.

* Can be implemented readily in a manner not covered by patents, and, 
hence, can be practiced freely.

The deflate algorithm does *not* attempt to compress "specialized" data, 
such as raster graphics, as well as the best currently-available 
algorithms that have been optimized to those tasks. However, it is this 
author's interpretation of the spec that a custom-designed 
encoder/decoder pair can implement an optimized algorithm if so desired; 
naturally, if the compression engine is not incorporated into the 
public-domain PNG spec, few to none (most likely none) of the standard 
PNG decoders will be able to decode the data.

Recall from the compression primer that there are cases in which a 
lossless compression algorithm can yield a "compressed" file that is 
*larger* than the original; it is easy to show that no lossless 
compression algorithm can compress every data set (although it's 
relatively easy to compare the compressed data-set size with the input 
data-set size, and forego emitting a compressed version if it is not 
smaller than the input data set by some margin, where the margin is 
selected to justify the time required to decompress the data). For the 
deflate algorithm, the worst-case size increase -- for large data sets 
(which represent the only meaningful case) -- is about 0.015%. (This is 
a very reasonable figure.) English text usually compresses by a factor 
of 2.5 to 3, executable files typically compress somewhat less, and 
graphical data (e.g., raster images or CT data) may compress much more. 
Again, as demonstrated in the compression primer, the compression 
performance of any given algorithm depends heavily on the input data 
set.

SECTION II: Comparison with Other Specs
----------------------------------------------------

2.1: PNG vs. GIF

The leading lossless compression format currently in use on the 'Net is 
GIF. It is this author's opinion that (1) PNG will probably soon replace 
GIFs position as the leading lossless format, and (2) "soon" is probably 
not soon enough. :-)

Compared to GIF, PNG wins in virtually every category conceivable. Sure, 
"deflate" is rumored to provide 10-30% better compression than GIF, but 
that's only part of the story. Here's a quick summary for you:

* PNG handles bitdepths greater than 8 bits -- including truecolor 
images up to 48 bits per pixel and greyscale images up to 16 bits per 
pixel.

* PNG sports a very nice 2-D interlace format, as compared to GIF's 1-D 
format.

* PNG provides for incorporation of textual data. (Well, this one's 
really more of a tie, since GIF purportedly sports the capability to 
incorporate text.)

* PNG provides a full alpha channel.

* PNG provides the capability to correct/account for display gamma.

* PNG incorporates a pre-compression filtering stage with the capability 
to add new filters in the future.

* The PNG compression algorithm ("deflate" compression) is firmly 
established in the public domain.

* Finally, as mentioned already, PNG purportedly provides (ah, 
alliteration) 10-30% better compression than GIF -- plus the capability 
to incorporate improved compression engines down the road.

In short, GIF has been outclassed. End of comparison. Well, almost the 
end -- it is worth mentioning that apps supporting the full PNG spec are 
still hard to find, especially for the Mac; there are, however, several 
apps providing PNG capability for Windows and other platforms. For a 
more-or-less complete list, check out:

<http://www.cdrom.com/pub/png/pngapps.html>

Also, it is worth noting that GIF can do multi-image sequences, which is 
explicitly disallowed in PNG. There has been sporadic discussion on a 
multi-image (or multimedia) PNG variant, possibly to be named MPNG or 
MNG.

2.2: PNG vs. JPEG

Comparing the PNG spec to the JPEG spec is an apples and oranges 
analogy; PNG is lossless, while JPEG is lossy. However, it's worth 
mentioning that JPEG does not  provide support for an alpha channel, 
gamma correction, or the modularity and long-term supportability of PNG. 
Moreover, JPEG does not perform well when decomposing high-bandwidth 
images (see previous compression primer for discussion of JPEG and its 
shortcomings in this area); a high-bandwidth image is one that has 
"sharp" transitions in it, such as line art. (Most "sharp edges" are 
high-bandwidth transitions; thus, a sharp photograph probably has a high 
bandwidth.) In these cases, JPEG decompositions will result in Gibbs-
phenomenon effects, which many graphic artists mistake for moires. (This 
mistake sends many graphic artists back to their scanners to re-scan at 
a higher resolution, which only compounds the problem.) A simple answer 
to this problem is to compress the image via a lossless format, such as 
PNG. Thus, for high-bandwidth images ("sharp images," for all intents 
and purposes), PNG compression will provide a much more pleasing result 
than JPEG.

SECTION III: Applications of PNG
-------------------------------------------

As a lossless compression format, PNG is best suited to high-bandwidth 
images. (For the signal-processing layman, high-bandwidth is typically 
synonymous with high-resolution images that contain sharp transitions -- 
e.g., line art, rasterized text, etc.) For high-bandwidth images, JPEG 
compression -- based on an inherently narrowband transform -- aliases 
the image areas that contain sharp transitions (a sort of 2-D Gibb's 
phenomenon).

For low-bandwidth images, JPEG still provides a very high compression 
rate (at a cost), and is probably better suited to applications where 
alpha channels and gamma correction aren't useful or required. (Medical 
X-rays, for example -- although the medical community is currently 
pressing pretty hard for implementation of wavelet-based techniques, 
which are very likely to make the current JPEG formats obsolete.)

SECTION IV: Review
---------------------------

I'm a signal processor, so let me get my own narrow-minded signal-
processing issues out of the way first: I think that, in the mid term, 
wavelet-based techniques will supplant the current JPEG and, most likely 
(at least where imagery is concerned) the current PNG definition. The 
advantage of PNG is that it is well designed to be able to *incorporate* 
a wavelet-based compression engine once it comes along -- moreover, most 
of the wavelet bases currently in wide use are well established in the 
public domain. Although wavelets mean compression loss, I think that the 
capability of PNG to provide an *option*, at compression time, of 
lossless vs. lossy formats will be key to its survival in the long run.
Wavelet compression also provides a 2-D progressive-display capability 
that is much more "natural" than the Adam7 format, since a wavelet-based 
progressive display presents a true low-bandwidth representation of the 
image initially, and progressively backfills the image with higher-band 
information. (In a sense, the image goes from blurry to sharp instead of 
from "pixellated" to sharp.) Moreover, wavelet-based progression doesn't 
require interlacing -- so there is *no* penalty in terms of storage, 
transfer time, or encode/decode processing time.

Okay, now that I've gotten that out of the way ...

PNG is the first compression spec -- that I know of -- to support gamma 
correction. In fact, before I started reviewing the PNG spec, I didn't 
even know what gamma *was*. (Now I are an expert. :-) The full alpha 
channel is a very nice touch, and the progressive-display/interlacing 
scheme is also quite nice. But you know that by now. Point is, PNG 
incorporates some very nice features that -- to my knowledge -- are not 
to be found elsewhere. Moreover, they do not exist in either of the 
current standards -- JPEG and GIF.

Another very attractive feature is PNG's adaptive scanline filtering, 
which allows the filtering scheme to be optimized for each scanline, if 
the encoder chooses to do so. This can potentially yield much greater 
compression than a single-filter approach -- and the cost of a filter 
byte at the start of each scanline is very minimal. As stated in the 
spec, "The potential benefits of adaptive filtering are too great to 
ignore" -- and they are.

In summary, PNG incorporates features that range from "Gee, that's nice" 
to "Awesome!" (Which features fall into which categories depends on your 
particular application.) PNG is a long-range supportable format, and I 
think that's critical for two reasons: (1) it provides the capability to 
adapt the specification to incorporate new and useful techniques down 
the road, and (2) I think it's going to be around for a long time.

If you have comments on this review or information to add, please feel 
free to contact me. 

- Vince Sabio
  vince-png@vjs.org