[PNG icon] A Basic Introduction to PNG Features

This page is intended to provide an explanation of some of the features of the PNG format for non-technical users. As such, it doesn't emphasize PNG features like freedom from patents; those are more of concern to developers. Where programmer information is given, it is principally to explain to the user why various applications may not perform as well as expected. Where performance claims are made--especially compression comparisons with other image formats--we assume that the PNG implementation is at least as good as the best freeware encoders. Note that this is currently not necessarily a valid assumption in the case of a number of popular (and expensive) image editors, but it's not always clear where the problem lies.

Please let Greg know if parts of this page still don't make sense or if there are other PNG features and/or foibles that aren't covered here. Greg would like this to be a friendly and usable resource for non-experts.

A Russian translation of this page (with some additional information) is available here:

(Thanks to Ivan Zenkov for the translation!)

Finally, there are a number of third-party pages that provide different and complementary perspectives on PNG:


Typical Usage

The Portable Network Graphics (PNG) format was designed to replace the older and simpler GIF format and, to some extent, the much more complex TIFF format. (See the main page or the history page for background information.) Here we'll concentrate on two major uses: the World Wide Web (WWW) and image-editing.

For the Web, PNG really has three main advantages over GIF: alpha channels (variable transparency), gamma correction (cross-platform control of image brightness), and two-dimensional interlacing (a method of progressive display). PNG also compresses better than GIF in almost every case, but the difference is generally only around 5% to 25%, not a large enough factor to encourage folks to switch on that basis alone. One GIF feature that PNG does not try to reproduce is multiple-image support, especially animations; PNG was and is intended to be a single-image format only. (A very PNG-like extension format called MNG was finalized in mid-1999 and is beginning to be supported by various applications, but MNGs and PNGs will have different file extensions and different purposes.)

For image editing, either professional or otherwise, PNG provides a useful format for the storage of intermediate stages of editing. Since PNG's compression is fully lossless--and since it supports up to 48-bit truecolor or 16-bit grayscale--saving, restoring and re-saving an image will not degrade its quality, unlike standard JPEG (even at its highest quality settings). And unlike TIFF, the PNG specification leaves no room for implementors to pick and choose what features they'll support; the result is that a PNG image saved in one app is readable in any other PNG-supporting application. (Note that for transmission of finished truecolor images--especially photographic ones--JPEG is almost always a better choice. Although JPEG's lossy compression can introduce visible artifacts, these can be minimized, and the savings in file size even at high quality levels is much better than is generally possible with a lossless format like PNG. And for black-and-white images, particularly of text or drawings, TIFF's Group 4 fax compression or the JBIG format are often far better than 1-bit grayscale PNG.)

Like GIF and TIFF, PNG is a raster format, which is to say, it represents an image as a two-dimensional array of colored dots (pixels). PNG is explicitly not a vector format, i.e., one that can store shapes (lines, boxes, ellipses, etc.) and be scaled arbitrarily without any loss of quality (generally speaking). For that you probably want SVG or PostScript. (There are some private extensions to PNG that add vector information in addition to PNG's regular pixels--Macromedia's Fireworks does something along those lines--but no valid PNG may omit the pixel data.)


Compression

PNG's compression is among the best that can be had without losing image information and without paying patent fees, but not all implementations take full advantage of the available power. Even those that do can be thwarted by unwise choices on the part of the user.

PNG supports three main image types: truecolor, grayscale and palette-based ("8-bit"). JPEG only supports the first two; GIF only the third (although it can fake grayscale by using a gray palette). The impact on compression comes from the ability to mix up image types in PNG. Specifically, forcing an application to save an 8-bit palette image as a 24-bit truecolor (or "RGB") image is not going to result in a small file. This may be unavoidable if the original has been modified to include more than 256 colors (for example, if a continuous gradient background has been added), but many images intended for the Web have 256 or fewer colors.

On the programmer's side, one common mistake is to include too many palette entries in a PNG image. This error is most noticeable when converting tiny GIF images (bullets, buttons, etc.) to PNG format; these images are typically only 1000 bytes or so in size, and storing 256 three-byte palette entries where only 50 are needed would result in over 600 bytes of wasted space.

Another common programmer mistake is to use only one type of compression filter, or to vary them incorrectly. Compression filters are described below and can make a dramatic difference in the compressibility of the image. In general this is not a feature that users should be forced to experiment with.

Finally, the low-level compression engine itself can be tweaked to compress either better or faster. Often "best compression" is the preferred setting, but an implementor may choose to use an intermediate level of compression in order to boost the interactive performance for the user. Usually the difference in file size is small, but there are cases where such a choice can make a big difference.

See the zlib home page for further details on PNG's compression engine and the CRC-32 algorithm, the 7-Zip home page for an alternative implementation of the deflate algorithm, and Vince Sabio's Compression Primer for an overview of compression in general. For tools to optimize the compression of PNG images, see the converters page (especially Glenn Randers-Pehrson's pngcrush and Ulead's SmartSaver).


Compression Filters

Compression filters are a way of transforming the image data (losslessly) so that it will compress better. Each horizontal line in the image can have one of five filter types associated with it; choosing which of the five to use for each line is almost more of a black art than a science. Nevertheless, at least one reasonably good algorithm is not only known but also described in the PNG specification and implemented in freely available software. Other algorithms are likely to perform even better, but so far this has not been an active area of research.

By way of example--admittedly an extreme and unrealistic case*--a 512 x 32,768 image containing all 16,777,216 possible 24-bit colors compressed over 300 times better with filtering than without. The uncompressed image was 48 MB in size; the compressed-but-unfiltered version was around 36 MB; but the filtered version is only 115,989 bytes (0.1 MB). Yow. (A 4096 x 4096 version, created by Paul Schmidt, is a mere 59,852 bytes--more than 600 times better than the unfiltered version, at an overall compression ratio of 841:1. Ted Samuels ran it through Ken Silverman's PNGOUT utility--see the converters page for links to it and other optimizers--and trimmed it to 57,549 bytes, for an overall 875:1 ratio. See this page for a downloadable version and further info.)

A more realistic example is the oceanography data at NASA's Ocean ESIP site. Digital maps displaying various physical measurements can be generated dynamically in either GIF or PNG format; the PNG versions are invariably one-fifth the size of the GIFs, thanks to PNG's compression filters. For example, a map showing the surface height of the northeastern Pacific Ocean on 1 August 1997 (during a major El Niño) is 70,090 bytes in GIF format but only 13,880 bytes in PNG format.

See the Filter Algorithms chapter of the PNG specification for details.

* As a measure of just how unrealistic, note that these seemingly hyper-compressed PNG images can themselves be compressed by an additional factor of anywhere from 21 to 97 or so (depending on which image) simply by applying gzip to them. Of course, a gzip'd PNG is not terribly useful in most contexts, and MNG is the best of all--it drops the size to 456 bytes.


Alpha Channels

Also known as a mask channel, an alpha channel is simply a way to associate variable transparency with an image. Whereas GIF supports simple binary transparency--any given pixel can be either fully transparent or fully opaque--PNG allows up to 254 levels of partial transparency in between for "normal" images (or 65,534 levels for the special "deeply insane" formats, but here we're concentrating on image depths that are useful on the Web).

All three PNG image types--truecolor, grayscale and palette--can have alpha information, but it's most commonly used with truecolor images. Instead of storing three bytes for every pixel (red, green and blue), now four are stored: red, green, blue and alpha, or RGBA. The variable transparency allows you to create "special effects" that will look good on any background, whether light, dark or patterned. For example, a photo-vignette effect can be created for a portrait by making a central oval region fully opaque (i.e., for the face and shoulders), the outer regions fully transparent, and a transition region that varies smoothly between the two extremes. When viewed with a Web browser such as Arena, the portrait would fade smoothly to white when viewed against a white background, or smoothly to black if against a black background. Drop-shadows are another ideal application for alpha transparency; in the images below, the same toucan image is displayed against a colorful background and against another copy of itself:

[RGBA toucan viewed with rpng2 -bgpat 14]   [RGBA toucan overlaid on itself against a pink background]
Stefan Schneider's shadow-casting toucan displayed against different backgrounds

This transparency feature is far more important for the small web graphics that are typically used on web pages, such as colored (circular) bullets and fancy text. Alpha blending allows one to use anti-aliasing--creating the illusion of smooth curves on a grid of rectangular pixels by smoothly varying the pixels' colors--to make rounded and curved images that look good against any background, not just against a white background (for example). Thus the same image can be reused in many places without the "ghosting" effect that occurs with GIFs.

Of course, effective replacements for GIF buttons and icons must be comparable in size as well, and that mostly rules out truecolor RGBA images. But PNG supports alpha information with palette images as well; it's just slightly harder to implement in a smart way. A PNG alpha-palette image is just that: an image whose palette also has alpha information associated with it, not a palette image with a full alpha mask. In other words, each pixel corresponds to an entry in the palette with red, green, blue and alpha components. So if you want to have bright red pixels with four different levels of transparency, you must use four separate palette entries to accommodate them. (All four entries will have identical RGB components, but the alpha values will differ.) If you want all of your colors to have four levels of transparency, you've effectively reduced your total number of available colors from 256 to 64. In general, though, only some of the colors need more than one level of transparency, and recognizing which ones is where things get tricky for the programmer. (If you don't want to trust your local programmer, have a look at pngquant, which converts 32-bit RGBA PNGs into 8-bit RGBA-palette images. If you are a programmer, also have a look at it; full source code is included.)

For a better explanation with some nice sample images, see the Anti-aliasing and Transparency chapter of Chris Lilley's excellent WWW4 paper, Not Just Decoration: Quality Graphics for the Web.


[Gamma correction NOW!]

Gamma Correction

Gamma correction basically refers to the ability to correct for differences in how computers (and especially computer monitors) interpret color values. Web authors in particular are probably aware that Macintosh-generated images tend to look too dark on PCs, and PC-generated images tend to look too light on Macs. An image that looks good on an SGI workstation won't look right on either a Macintosh or a PC, and even a PC-created image won't look right on all PCs.

Gamma information is a partial solution. It's a means of associating a single number with a computer display system, in an attempt to characterize the tricky physics lurking within a graphics card's digital-to-analog converter (RAMDAC) and within a monitor's high-voltage electron gun. Gamma is only an approximation; a better approximation is to use so-called chromaticity values (also supported by PNG) as well as gamma, but even this is an approximation. The absolute best solution currently available is to use a complete color management system (which, again, PNG supports via the sRGB extension chunk). For most people, however, just supplying the gamma value of the image and correcting for the corresponding gamma value of the monitor system is sufficient.

For further information, see Chris Lilley's tutorials on gamma, chromaticity and color management, or the Gamma Tutorial appendix in the PNG specification. For more detailed technical information, see Charles Poynton's Gamma and Color FAQs, the International Color Consortium home page, the sRGB home page, John Denker's extensive color management page, or Chris's chapter on gamma correction (and subsequent chapters) in Not Just Decoration: Quality Graphics for the Web. (Gamma logo courtesy of Claus Cyrny.)


Interlacing

Interlacing--or, more generally, progressive display--has been around a long time. GIF has supported it since 1989, TIFF since around the same time (though not in any standardized way), and JPEG since the early 1990s (though it wasn't widely implemented until 1996). PNG's method is conceptually similar to GIF interlacing and visually similar to progressive JPEG (i.e., two-dimensional).

Here is a GIF animation by Willem van Schaik that shows some of the benefits of PNG's 2D interlacing scheme over GIF's one-dimensional version:

[GIF animation of PNG and GIF interlacing]
PNG's 2D interlacing (left) compared with GIF's 1D interlacing (right)

The first thing to notice is that only the top one-eighth or so of the GIF image is visible by the time the PNG image's first pass is complete. PNG's first pass is only 1/64th of the image data; GIF's is 1/8th. By the time GIF's first pass is done, four PNG passes have been displayed--and unlike the GIF pixels, which are stretched by a factor of 8:1 at this point, the PNG pixels are only stretched by 2:1. (Indeed, there is no stretching at all in PNG's odd-numbered passes, and its even passes are all stretched by 2:1 vertically. This means that embedded text in an image is typically readable about twice as fast in a PNG image.)

Also note that PNG's seventh pass and GIF's fourth pass are identical--both consist of every other scanline. They each therefore represent fully one half of the image data and one half of the decoding time. (The relative timing in the animation above has been adjusted to emphasize the earlier passes over the later ones.)

Check out the PNG interlacing demo for a "zoomed" look at how PNG's interlaced pixels are displayed, or see the Data Representation chapter of the PNG specification for details of PNG's interlacing scheme.


File Integrity Checks

PNG supports three main types of integrity-checking to help avoid problems with file transfers and the like. The first and simplest is the eight-byte magic signature at the beginning of every PNG image. It will detect the most common type of file corruption: that due to the transfer of a binary file in text (or "ASCII") mode. On most systems, line-endings in text files are flagged by either a carriage-return character (CR), a line-feed character (LF), or both. Macintoshes use CRs; Unix systems use LFs; and all non-Unix PC systems (DOS, Windows 3.x/95/NT, OS/2) use CR/LF pairs. PNG's magic signature cleverly includes both a CR/LF pair and a single LF. Thus when transferring in text mode to a DOS box, for example, the bare LF will acquire a matching CR; when transferring to a Unix system, the CR/LF pair will turn into a plain LF; and when transferring to a Macintosh, both the CR/LF and the bare LF will probably turn into plain CRs. It's then a simple matter of looking at the first eight or nine bytes in the file to see whether text-corruption occurred (which is exactly the sort of thing the Unix file(1) command is designed to do). Keep in mind that messing up the signature isn't that big a deal; the real problem is that CR and LF characters in the image data--which don't have anything to do with line endings or text but instead refer to pixel values or more abstract compressor tokens--will also be converted, thus destroying the image.)

The second type of integrity-checking is known as a 32-bit cyclic redundancy check or CRC-32. PNG images are divided up into logical data chunks, and each chunk has an associated CRC stored with it. If even one bit in the chunk changes, the CRC value one would calculate from the damaged data will no longer match the stored CRC value from the original chunk data. This sort of thing can easily be tested without decoding the image; in fact, it can be tested on the fly, as the image is downloaded, if the downloading software is smart enough.

The third type of integrity check applies only to the image-data chunk(s) and is similar to the CRC values. Where an image chunk's CRC value applies to the filtered, compressed data in the chunk, the Adler-32 checksum applies to the complete stream of uncompressed data (regardless of how many image chunks that might span). It's really only used by the lowest-level compression library as a check against bad encoding and decoding software.

See the File Structure chapter of the PNG specification for details.


Pronunciation

No detail was too small for consideration in the authors' quest for a near-perfect image format; yea, verily, even the acronym and pronunciation were major topics of discussion. The reason, of course, is the GIF format; some pronounce it with a soft G like giraffe, some with a hard G like gift, and no one really knows what they're talking about. (For the record, the soft G is correct; it is how the author of the format pronounces it.)

"PNG" is always spelled* "PNG" (or "Portable Network Graphics") and always pronounced "ping" in English, not "pinj" or "pee en gee" or any other multi-syllabic disaster. (For non-English speakers, the three-letter pronunciation is fine, however.) See the introduction to the PNG specification (or the Scope section of the newer ISO/IEC/W3C version) for the definitive statement on the matter.

* Greg follows American English rules, but read spelt here if you "favour" the British "flavour." ;-)


Here are some related PNG pages at this site:


[primary site hosted by SourceForge] Last modified 14 March 2009.

Copyright © 1996-2009 Greg Roelofs. This page may be freely copied and modified under the terms of the GNU Free Documentation License.