Review of the Portable Network Graphics (PNG) Specification Version 1.2 29 Apr 96 Updated: Minor corrections to e-mail addresses and URLs Version 1.2.1 24 Jul 99 Version 1.2.2 03 Aug 00 Vincent Sabio, U.S. Army Research Laboratory (formerly) Return Path, Inc. (currently) Copyright 1996,1999,2000 Vincent Sabio -------------------------------------------------------- This document can be retrieved as follows: via FTP: via HTTP: There is also an accompanying compression primer for those of you who need a quick, layman's review of data compression: via FTP: via HTTP: -------------------------------------------------------- PORTABLE NETWORK GRAPHICS FORMAT -- The Long-Awaited Review Welcome to PNG -- the "Portable Network Graphics" format. From the looks of it, we'll be speaking of GIF in the past tense rather soon. (Portions of this discussion are excerpted from the PNG (Portable Network Graphics) specification, tenth draft, 5 May 1995.) NOTE: This review is intended for a non-technical audience; it is not intended to be a rigorous, technical treatment of the subject matter. This document is a summary and description of the PNG spec, and is in no way intended as a replacement or substitute for the complete specification. For more information on PNG, refer to these URLs: PNG spec: Greg Roelofs's PNG Home Page (including some very nice descriptions and demonstrations of 2-D interlacing): (I'd also like to thank Greg for the invaluable review, suggestions, commentary, and topical knowledge he provided on the first draft of this document.) To start off, PNG sports several features that make it a very attractive format for 'Net graphics; to wit (from my earlier compression primer): "It is a lossless compression scheme that incorporates gamma correction, an optional alpha channel with full-range transparency (unlike GIF's on/off transparency), two-dimensional interlacing (*very* nice, BTW), some very good (and very expandable) filtering schemes, and a public- domain compression algorithm that has been around for a while (and has thus proven itself). The scheme is designed to be flexible for individual purposes, provides for a limited form of forward compatibility (Very impressive, BTW. Years down the road, this will mean that an application using an "older format" PNG engine can *possibly* open and display a "newer format" PNG file -- and will be able to evaluate for itself, on the fly, whether it will be able to do so correctly), and even provides for embedded text strings." It is worth mentioning at this point that I had (and continue to have) nothing to do with the development of the PNG specification, nor am I likely to be associated with any future development of the spec. I am an electrical engineer (BSEE/MSEE Univ. of Maryland) working for the U.S. Army Research Laboratory in Adelphi, MD, and my area of specialization happens to be signal processing -- thus, I have a great personal interest in things like this. I have tried to keep the summary of the spec as objective as possible, but my own impressions (and comments) are littered throughout; I have identified them as such whenever they have worked their way into the discussion. While I'm on the subject of signal processing, it's also worth noting that some of the terminology used in this review might be different from what is used by most graphic artists. Specifically, what graphic artists refer to as "aliasing," we signal processors refer to as "spatial quantization noise." To us, "aliasing" refers to the low-frequency artifacts produced by digitally sampling a signal (an image is just a type of signal) at a sampling rate below the minimum rate as specified by Shannon's Sampling Theorem (typically, this threshold is twice the highest frequency component in the signal). These low-frequency artifacts can be seen in high-resolution images that are scanned at low resolutions; graphic artists typically refer to these artifacts as "moire patterns." Offhand, those are the only terms I can think of that are used differently between the graphic-arts and signal-processing worlds. Some more terms that will be used throughout the discussion, and thus are best defined at the outset, are as follows: decoder: any application that can read (input) and interpret a PNG file for display or manipulation. encoder: any application that can write (output) an image in the PNG format -- that is, a format consistent with the requirements of the PNG specification. scan line ("scanline"): a "line" (usually a row) of image data. In an interlaced file, the entire image is divided into several sub-images; in this case, a scanline is defined as a line in the sub-image. (See Section 1.6.1 for a more detailed explanation of interlaced scanlines.) max_bitdepth: the maximum value that can be represented by a single pixel's data word; if the pixel contains (bitdepth) bits, then the max_bitdepth value is {(2^bitdepth)-1}. So, for example, an 8-bit image has a max_bitdepth value of (2^8)-1 = 255. Okay, let's go through the PNG spec, section by section. I will leave out the boring parts (hopefully), and will focus on those areas that should be of interest to Photoshoppers (and graphic artists, in general). I will proceed through the spec in a non-linear fashion, presenting information in an order that supports the discussion as efficiently as possible -- thus, my section numbers do not parallel the PNG spec's. SECTION I: Summary ---------------------------- 1.1: File Structure A PNG file consists of a "signature" followed by a series of "chunks." 1.1.1: PNG Signature: The first eight bytes of a PNG file comprise the PNG signature; they are fixed, and identify the file as a single PNG image. There is no version information contained in the signature; this information can be contained in one or more of the ensuing chunks. The signature is thus invariant, and (for your reference) is composed of the following decimal values: 137 80 78 71 13 10 26 10 Every PNG file must start with these eight bytes. 1.1.2: Chunks: A "chunk" is a well-defined section of data; the data can be image pixel values, text, gamma information, an alpha channel, etc. Virtually all information in a PNG file is stored in one or more chunks; the typical PNG file will be composed of several chunks. Each chunk has a four-character name consisting of upper- and lower-case ASCII letters. Each chunk ends with a "cyclical redundancy check" (CRC) calculated on the preceding bytes in the chunk; the CRC verifies the integrity of the data in the chunk. The purpose of the chunk name is to identify the chunk and provide information about the chunk to the decoder. Judging strictly by the name of the chunk (assuming that it follows the PNG chunk-naming convention, discussed *very* briefly below), a decoder can tell if it can decode the PNG file or not. (An example of a case in which a decoder might not be able to decode the file would be if the file is of a format developed *after* the decoder was written, and contains data that cannot be interpreted by the older decoder. A current parallel is the new P-JPEG (progressive JPEG) format: older decoders that do not recognize this format would know that they cannot successfully decode the file.) Since I will be referring to some common chunk types in the course of this discussion, it is worth briefly mentioning the chunk-naming convention. Recall that chunk names are four characters; each character has a different meaning, and that meaning is further specified by the "case" (uppercase or lowercase) of the character: First character: uppercase := the chunk is critical to the display of the file's contents ("critical" chunk) lowercase := the chunk is not strictly necessary in order to meaningfully display the file's contents ("ancillary" chunk) Second character: uppercase := the chunk is part of the public specification ("public" chunk) lowercase := the chunk is not a part of the formal PNG spec ("proprietary" chunk) Third character: (Uppercase always) Reserved for future use; currently, all chunk names must have uppercase third characters. Fourth character: (of interest to PNG editors, not necessarily PNG decoders) uppercase := this chunk has been denoted as "unsafe to copy" under certain circumstances ("unsafe-to-copy" chunk) lowercase := this chunk is safe to copy ("safe-to-copy" chunk) Since a PNG file consists solely of a signature followed by a series of chunks, the image data, itself, is contained in one or more chunks -- these image data chunks use the chunk name "IDAT," and are thus referred to (in the spec, and surely in other PNG-related text as well) as "IDAT chunks." The PNG spec supports future expansion through the creation and public registration of new "public" chunk types. This means that a type of data, character set, etc., that is not supported in the current revision of the spec can be incorporated into future revisions via the creation and registration of new chunks to define and represent those structures. And that's just the "public" route; if there isn't sufficient demand to justify public registration of a new chunk, PNG still allows for the creation of *private* chunks -- that is, chunks that are intended to be recognized only by custom decoders designed to support them. 1.2: Pixel Structure 1.2.1: Data Representation All data words composed of more than a single byte will be in "network byte order"; the upshot of this is that Macs, Suns, VAXen, IBM mainframes, DECstations, PCs, etc., will be able to exchange PNG files directly, without reordering bytes (within words) for each platform's preferred byte orientation. Three types of pixels are supported: 1. Palette-mapped pixels are represented by a single value that is an index into a supplied palette. The bit depth determines the maximum number of palette entries, not the color precision within the palette. 2. Greyscale pixels are represented by a single value that is a greyscale level, where zero is black and the largest value for the bit depth is white. 3. True-color pixels are represented by three-value sequences: red (zero = black, max = red) appears first, then green (zero = black, max = green), then blue (zero = black, max = blue). The bit depth specifies the size of each value, not the total pixel size. 1.2.2: Pixel Dimensions Non-square pixels can be represented, but the PNG spec does *not* require viewers to account for them -- i.e., a viewer may present any PNG image as though its pixels are square. However, the spec *does* strongly encourage viewers running on display hardware with non-square pixels to rescale images for proper display. 1.3: Alpha Channels Optionally, greyscale and true-color pixels can also include an alpha value. An alpha value of zero represents a fully-transparent pixel; a max_bitdepth value represents a fully-opaque pixel. Intermediate values represent partially-transparent pixels. In toto, the alpha "channel" (i.e., the grouping of all the alpha values) can yield an image whose pixels range from fully opaque to fully transparent, which can then be overlaid on a background image to form a "composite" image. The actual color or greyscale pixel value, as stored in the PNG file, is not affected by the alpha-channel values. The values of the displayed pixels, including "transparency" effects, are calculated by the decoder. (This is referred to as "unassociated" or "non-premultiplied" alpha.) This requires that the file contain the full alpha-channel information; the alternative is to premultiply the pixels by the alpha fraction, but this results in an irretrievable loss of information (and a reduction in the level of creative control wielded by the artist). The selection of non-premultiplied alpha is, in this author's opinion, well worth the extra storage (and resultant bandwidth). Transparency control is possible without the storage cost of a full alpha channel, although the level of control is substantially reduced: in a palette image, an alpha value may be defined for each palette entry; in greyscale and true-color images, a single pixel value may be identified as being "transparent." Information for these transparency techniques is stored in a separate chunk. 1.3.1: Alpha-Channel Caveats 1.3.1.1. Alpha channels may not be included with images that have fewer than 8 bits per sample. 1.3.1.2. Decoders are not required to support transparency control! This means that your cleverly-composited image (on, say, one of your Web pages) does not have to be displayed by the decoder as a composite image -- in this case, the decoder will most likely display the second ("topmost") image directly over the first ("bottom") in a fully-opaque fashion. I would expect that the lion's share of even marginally-decent decoders will support full transparency control, but this is only an opinion. 1.3.1.3. PNG requires the image to be rectangular in area. Non- rectangular images are filled out (rectangularized?) by adding "fully transparent" pixels. To enhance compression, encoders are suggested to use the same color value for all fully-transparent pixels in the filled- out area. From the paragraph above, however, recall that decoders are not required to support transparency control -- thus, the PNG specification recommends to encoders that "the colors assigned to transparent pixels should be reasonable background colors whenever feasible." If the image already has a known background color, the PNG encoder can write this information into the bKGD chunk; thus, even decoders that ignore transparency can fill the unused screen area with the bKGD color. 1.4: Gamma Correction Gamma is a way of defining the brightness reproduction curve of a camera or display device. Failure to correct for image gamma can lead to a too- dark or too-light display; thus, gamma correction is very desirable whenever possible. If you are a math-o-phobe, you should skip to the next section now; I've tried to keep the gamma discussion straightforward, but it might get just a little hairy for some of you (sorry). If you're still with us, but only tentatively, then you might benefit from noting that the "^" character means "raised to the [power of]..." When brightness levels are expressed as fractions in the range 0 to 1, a device produces an output brightness level "obright" from an input brightness level "ibright" according to the equation: obright = ibright ^ gamma An overall gamma of 1.0 gives correct tone reproduction; thus, a decoder should use the gamma values of both the display device and the image file (when they are available) to accurately reproduce the image on the target device. Reversing the above equation, we can determine the input brightness level from the output brightness level and the gamma value: ibright = obright ^ (1.0 / gamma) Thus, if the gamma values for the image file (call it "file_gamma") and the display device (call it "display_gamma") are known, we can calculate the corrected gamma value of a "normalized" pixel (call it "norm_pix") as: corr_gamma = norm_pix ^ (1.0 / (file_gamma * display_gamma)) The "norm_pix" value must be pre-normalized (as a floating-point value in the range 0.0 to 1.0) from the pixel value (call it "pixval") in the image file by the following equation: norm_pix = pixval / max_bitdepth Defining a new value (constant for each PNG file), "corr_factor," as corr_factor = 1.0 / (file_gamma * display_gamma) yields the following calculation for each gamma-corrected pixel: corr_gamma = (pixval / max_bitdepth) ^ corr_factor (There is one more step required to calculate the final pixel value for the display unit's frame buffer, but that equation -- though quite simple -- is not necessary to complete this discussion.) Since the type of calculation required to compute corr_gamma for each pixel can require a great deal of computing power (and, thus, computing *time*), the decoder can simply calculate the corrected pixel values *once* to from a lookup table, and then use the "pixval" value as an index into the table. This process can be optimized still further to yield substantial reductions in processing (albeit at the cost of a minor amount of "scratch pad" storage to create the table). PNG expects encoders to record the gamma (if known), and it expects decoders to correct the image gamma if necessary for proper display on their display hardware. If no information on the display device is available, an assumed value of 2.2 is recommended by the PNG spec. This represents a contrast boost, since real CRTs are generally around 2.8; apparently, this is a common practice in T.V. and film. To preserve this contrast boost, the PNG spec suggests that the display gamma be divided by 1.25, though this is not a requirement for PNG compliance. Gamma correction is not applied to the alpha channel, if any. Alpha values always represent a linear fraction of full opacity. This is a good method, since pre-computing the gamma correction for the alpha channel would result in an irretrievable loss of information -- information that could not be recovered at the decoder. Finally, the PNG spec specifically states that encoders should *not* pre-correct gamma in the image. Similar to the alpha-channel case, gamma correction is an inherently lossy operation; thus, pre-correcting image gamma would not be a good idea. Also, it is this author's opinion that the computational price paid for gamma correction in the decoder is small enough to easily justify non-pre-corrected gamma. Section 1.5: Text Strings PNG supports the capability of storing full text strings along with the image; the text strings are stored in text "chunks" (see Section 1.1). The character set selected for (and, thus, supported by) PNG is ISO 8859-1 (Latin-1), which represents a "widely useful and reasonably portable character set," according to the PNG spec. Since PNG *does* provide for the creation and public registration of new ancillary chunks, it is both permissible and quite probable that other character sets will be supported in the future. Note that this provides a very flexible utility for storing alpha- numeric information along with the image. For example, the medical community can store patient information with PNG-compressed X-rays; similarly, key radar parameters and collection data can be stored with its associated SAR imagery. It is not necessary to create new chunk types for specific types of text data; instead the data can be stored in the already-defined tEXt chunk, along with reasonably self-explanatory keywords. (To this end, the PNG spec recommends using keywords that are fully spelled out, instead of abbreviations.) Section 1.6: Image Processing Although image processing ("signal" processing, really) lies at the heart of the PNG spec (or any compression specification), I will not delve too deeply into the topic area because of its mathematical nature. Plus, many of the topics to be discussed below were already covered in the compression primer distributed earlier (and also available in this directory). If you do not have the compression primer, it can be found here: via FTP: via HTTP: I will discuss data interlacing first, and then go on to discuss PNG filtering and compression. This is somewhat out of order (from an intuitive perspective), but the interlacing scheme factors into the filtering and compression sections, so it needs to be discussed first. 1.6.1: Interlacing "Interlacing" is a means of storing (and re-generating/displaying) an image such that, when decoded, it is initially displayed as a low- quality (spatially downsampled) version of the original, and finer details are "filled in" in progressive steps until the entire, fully- detailed image is displayed. Images of this nature are also referred to as "progressive" images, since the image seems to progress from a low- quality to a high-quality image as the load completes. ("Interlacing" actually refers to the format in which the image data is stored in the file.) There is a subtle -- but important -- distinction that needs to be made where interlaced/progressive images are concerned: Specifically, image loads (for the *full* image) are *not* faster in the "interlaced" case than they are in the "straight-laced" :-) case. In fact, the (very slight) overhead associated with loading and displaying an interlaced image will, if anything, result in a slightly *longer* load time than in a similar, non-interlaced image. The advantage of interlacing lies in the time required to get an initial, low-resolution image onto the screen; while non-interlaced images are drawn in a linear, top-to-bottom fashion (and can thus require a substantial amount of time to draw), interlaced images present a downsampled version of the entire image rather quickly -- so the human operator can begin to view the image almost immediately -- and then fill in the image in a progressive fashion as the rest of the data loads. Thus, there is no savings -- either in time or bandwidth -- from the start of the image load to the completion of the image load in interlaced vs. non-interlaced imagery (in fact, there is a slight penalty in the progressive-display case). Interlacing is performed entirely and exclusively for the benefit of the human operator. Typically, images are interlaced in one dimension (e.g., GIF); i.e., whole scanlines are reordered, yielding a "venetian blind" effect when the image is displayed. In the PNG case, however, the interlacing is performed in *both* dimensions, creating a sense of the image "fading in" (rather than "wiping down"). To some degree, the distinction is purely aesthetic; however, from the perspective of the purpose of interlacing -- providing a meaningful display for the user as quickly as possible -- the goal is achieved somewhat faster. Specifically, the "excess" bandwidth required to transmit a full-band scanline (per the GIF approach) is used more efficiently by PNG to transmit *several* partial scanlines. Depending on your age, the effect is either more pleasing to eye, or just a whole lot cooler. In the PNG approach, the entire image is displayed in seven distinct steps, or "passes." (The format is referred to as "Adam7.") Note that while PNG decoders are required to be able to read interlaced images, they are not required to actually perform progressive display. Also, it is worth mentioning that interlacing slightly expands the file size (on the average). 1.6.2: Filtering The image is filtered before it is compressed. (Recall from the compression primer that certain spatial filters permit improved compression performance.) Note that the image is effectively interlaced before it is filtered; thus, we have the following flow for these steps: (image) --> interlace --> filter --> compress This permits the most efficient data flow (for these three steps) through the decoder (i.e., the decoder does not have to wait for the full image to load before it can begin decoding interlaced images). There are currently four filter options -- plus "None," which means that the scanline in question is not filtered -- although there is the capability for up to 255 individual filters, plus "None," within each set of filters. In addition to that, the presence of the overall filter type byte (similar to the compression- type byte) means that there can be up to 256 *sets* of 256 filters. The following filters are currently supported: Code Name 0 None 1 Sub 2 Up 3 Average 4 Paeth Brief descriptions of these filters are as follows: 0 (None): Self explanatory. 1 (Sub): Compute the difference between the current byte and the value of the corresponding byte of the preceding pixel. 2 (Up): Compute the difference between the current byte and the value of the corresponding byte of the pixel directly above the current pixel. 3 (Average): Compute the difference between the current byte and the average of the corresponding bytes in the pixels directly above and directly to the left of the current pixel. 4 (Paeth): Compute the difference between the current byte and a predictor computed from the values of the bytes directly above, directly to the left, and directly to the above-left corner of the current pixel. For palette-based color and for images with bitdepths less than 8, filter type 0 ("None") is recommended in the spec. An important point where PNG is concerned is that the filters are (optionally) applied on a scanline-by-scanline basis; thus, the encoder can select the most appropriate filter for each scanline, depending on the data present. One effective (but inefficient) way to do this is to run *each* filter on *each* scanline, and select -- for each scanline -- the filter that performs the best on that line. Unfortunately, this is a very computationally-intensive approach, especially for large images, so more efficient techniques will most likely be employed in the filter- selection process. Finally, note that filter optimization is optional; if an encoder so desires, it can default to any one of the filters for every scanline, or employ no filtering at all. I think it's unlikely that even the laziest programmers would employ a "no filters at all" option, since implementation of any one of the filters shown above is pretty simple (even Paeth), and will most likely yield better performance (than no filter at all) in the majority of cases. 1.6.3: Compression PNG provides the capability to support multiple compression formats, although "deflate" compression (referenced in the PNG spec as "PNG compression type 0") is the only method currently defined for PNG. The upshot of the provision to support multiple formats is that better compression engines -- e.g., wavelet-based methods (albeit lossy), once they make their way from the blackboards to the commercial applications -- can be incorporated with relative ease. Although it requires updating the encoders and decoders (a straightforward process -- we're all used to DLing the latest upgrades of various apps and plug-ins), it provides the capability to continuously update the specification as better algorithms become available. In short, PNG is a spec that is designed for long-range supportability. The compressed data in the PNG format are referred to as a "zlib" datastream; a single zlib datastream comprises one or more IDAT chunks, or a single zTXt chunk (i.e., user-readable text can also be compressed). Zlib datastreams in IDAT chunks decompress to filtered data as described above; datastreams in zTXt chunks decompress to readable text (assuming, of course, that they *contain* readable text). Since PNG supports "deflate" compression, I'll give a brief overview of the method. Deflate compression is an LZ77 derivative used in zip, gzip, pkzip, etc. (It is referred to as "LZ77" since it was first published by Ziv and Lempel in 1977 ("IEEE Transactions on Information Theory," vol. 23, no. 3)) According to the spec, "Extensive research has been done supporting its patent-free status. Portable C implementations are freely available." This is good news for us, but it's *really* good news for companies like Adobe, which certainly would prefer to avoid the types of hassles created by the GIF fiasco. "Deflate" uses a combination of LZ77 and Huffman encoding. Since very few (most likely none) of you are likely to be interested in the guts of either LZ77 or Huffman coding (or Huffman trees, which are also employed in the algorithm), I will not discuss them here; however, the deflate specification is currently at draft 1.3, and is available from: UPDATE 24 Jul 99: The deflate document is no longer available, though the /doc directory contains a wealth of information on the Zip format. Note that there is now a zlib home page, from which you can find the most current version of the specification: The deflate algorithm has the following properties: * It is independent of CPU type, operating system, file system, and character set. * It compresses data with an efficiency comparable to the best currently-available general-purpose compression methods. * Can be implemented readily in a manner not covered by patents, and, hence, can be practiced freely. The deflate algorithm does *not* attempt to compress "specialized" data, such as raster graphics, as well as the best currently-available algorithms that have been optimized to those tasks. However, it is this author's interpretation of the spec that a custom-designed encoder/decoder pair can implement an optimized algorithm if so desired; naturally, if the compression engine is not incorporated into the public-domain PNG spec, few to none (most likely none) of the standard PNG decoders will be able to decode the data. Recall from the compression primer that there are cases in which a lossless compression algorithm can yield a "compressed" file that is *larger* than the original; it is easy to show that no lossless compression algorithm can compress every data set (although it's relatively easy to compare the compressed data-set size with the input data-set size, and forego emitting a compressed version if it is not smaller than the input data set by some margin, where the margin is selected to justify the time required to decompress the data). For the deflate algorithm, the worst-case size increase -- for large data sets (which represent the only meaningful case) -- is about 0.015%. (This is a very reasonable figure.) English text usually compresses by a factor of 2.5 to 3, executable files typically compress somewhat less, and graphical data (e.g., raster images or CT data) may compress much more. Again, as demonstrated in the compression primer, the compression performance of any given algorithm depends heavily on the input data set. SECTION II: Comparison with Other Specs ---------------------------------------------------- 2.1: PNG vs. GIF The leading lossless compression format currently in use on the 'Net is GIF. It is this author's opinion that (1) PNG will probably soon replace GIFs position as the leading lossless format, and (2) "soon" is probably not soon enough. :-) Compared to GIF, PNG wins in virtually every category conceivable. Sure, "deflate" is rumored to provide 10-30% better compression than GIF, but that's only part of the story. Here's a quick summary for you: * PNG handles bitdepths greater than 8 bits -- including truecolor images up to 48 bits per pixel and greyscale images up to 16 bits per pixel. * PNG sports a very nice 2-D interlace format, as compared to GIF's 1-D format. * PNG provides for incorporation of textual data. (Well, this one's really more of a tie, since GIF purportedly sports the capability to incorporate text.) * PNG provides a full alpha channel. * PNG provides the capability to correct/account for display gamma. * PNG incorporates a pre-compression filtering stage with the capability to add new filters in the future. * The PNG compression algorithm ("deflate" compression) is firmly established in the public domain. * Finally, as mentioned already, PNG purportedly provides (ah, alliteration) 10-30% better compression than GIF -- plus the capability to incorporate improved compression engines down the road. In short, GIF has been outclassed. End of comparison. Well, almost the end -- it is worth mentioning that apps supporting the full PNG spec are still hard to find, especially for the Mac; there are, however, several apps providing PNG capability for Windows and other platforms. For a more-or-less complete list, check out: Also, it is worth noting that GIF can do multi-image sequences, which is explicitly disallowed in PNG. There has been sporadic discussion on a multi-image (or multimedia) PNG variant, possibly to be named MPNG or MNG. 2.2: PNG vs. JPEG Comparing the PNG spec to the JPEG spec is an apples and oranges analogy; PNG is lossless, while JPEG is lossy. However, it's worth mentioning that JPEG does not provide support for an alpha channel, gamma correction, or the modularity and long-term supportability of PNG. Moreover, JPEG does not perform well when decomposing high-bandwidth images (see previous compression primer for discussion of JPEG and its shortcomings in this area); a high-bandwidth image is one that has "sharp" transitions in it, such as line art. (Most "sharp edges" are high-bandwidth transitions; thus, a sharp photograph probably has a high bandwidth.) In these cases, JPEG decompositions will result in Gibbs- phenomenon effects, which many graphic artists mistake for moires. (This mistake sends many graphic artists back to their scanners to re-scan at a higher resolution, which only compounds the problem.) A simple answer to this problem is to compress the image via a lossless format, such as PNG. Thus, for high-bandwidth images ("sharp images," for all intents and purposes), PNG compression will provide a much more pleasing result than JPEG. SECTION III: Applications of PNG ------------------------------------------- As a lossless compression format, PNG is best suited to high-bandwidth images. (For the signal-processing layman, high-bandwidth is typically synonymous with high-resolution images that contain sharp transitions -- e.g., line art, rasterized text, etc.) For high-bandwidth images, JPEG compression -- based on an inherently narrowband transform -- aliases the image areas that contain sharp transitions (a sort of 2-D Gibb's phenomenon). For low-bandwidth images, JPEG still provides a very high compression rate (at a cost), and is probably better suited to applications where alpha channels and gamma correction aren't useful or required. (Medical X-rays, for example -- although the medical community is currently pressing pretty hard for implementation of wavelet-based techniques, which are very likely to make the current JPEG formats obsolete.) SECTION IV: Review --------------------------- I'm a signal processor, so let me get my own narrow-minded signal- processing issues out of the way first: I think that, in the mid term, wavelet-based techniques will supplant the current JPEG and, most likely (at least where imagery is concerned) the current PNG definition. The advantage of PNG is that it is well designed to be able to *incorporate* a wavelet-based compression engine once it comes along -- moreover, most of the wavelet bases currently in wide use are well established in the public domain. Although wavelets mean compression loss, I think that the capability of PNG to provide an *option*, at compression time, of lossless vs. lossy formats will be key to its survival in the long run. Wavelet compression also provides a 2-D progressive-display capability that is much more "natural" than the Adam7 format, since a wavelet-based progressive display presents a true low-bandwidth representation of the image initially, and progressively backfills the image with higher-band information. (In a sense, the image goes from blurry to sharp instead of from "pixellated" to sharp.) Moreover, wavelet-based progression doesn't require interlacing -- so there is *no* penalty in terms of storage, transfer time, or encode/decode processing time. Okay, now that I've gotten that out of the way ... PNG is the first compression spec -- that I know of -- to support gamma correction. In fact, before I started reviewing the PNG spec, I didn't even know what gamma *was*. (Now I are an expert. :-) The full alpha channel is a very nice touch, and the progressive-display/interlacing scheme is also quite nice. But you know that by now. Point is, PNG incorporates some very nice features that -- to my knowledge -- are not to be found elsewhere. Moreover, they do not exist in either of the current standards -- JPEG and GIF. Another very attractive feature is PNG's adaptive scanline filtering, which allows the filtering scheme to be optimized for each scanline, if the encoder chooses to do so. This can potentially yield much greater compression than a single-filter approach -- and the cost of a filter byte at the start of each scanline is very minimal. As stated in the spec, "The potential benefits of adaptive filtering are too great to ignore" -- and they are. In summary, PNG incorporates features that range from "Gee, that's nice" to "Awesome!" (Which features fall into which categories depends on your particular application.) PNG is a long-range supportable format, and I think that's critical for two reasons: (1) it provides the capability to adapt the specification to incorporate new and useful techniques down the road, and (2) I think it's going to be around for a long time. If you have comments on this review or information to add, please feel free to contact me. - Vince Sabio vince-png@vjs.org