DjVu: Scanned Documents on the Web.
This page describes the design of DjVu, a document compression and imaging technology that allows the efficient online distribution of high resolution scanned documents. More information can be found on DjVu.org. You can also see the page describing DjVuLibre, a free implementation of the DjVu system. Finally, you can install the DjVu browser plugin, and view the DjVu slides.
Despite the exponential growth of the internet, most of the human knowledge is preserved on paper in the form of books, magazines, journals, … Scanning these documents offers a cost-effective way to put them online. Unfortunately high-resolution color scanned images are too large to be practical.
The main DjVu innovation is a document compression technique that reduce these high resolution image to a size comparable to that of a typical web page. For instance, DjVu needs 40KB to 60KB to represent a typical magazine page scanned in color with a resolution of 300 dpi. Such large compression ratios are possible because DjVu understands many aspects of document images.
- The segmentation step separates a foreground image from a background image.
- The foreground image contains the text and the line art. High compression ratios are achieved by collecting the repeated characters into a shape dictionary and simply coding the position of their occurences.
- The background image contains the paper texture and the photographic images. High compression ratios are achieved with simple wavelets because this part of the image can be rendered with a lesser resolution.
DjVu documents can be viewed using a sophisticated browser plugin. The DjVu viewer efficiently represents high resolution images with limited memory and implements very efficient zooming and panning. The viewer automatically download the next pages in the background in order to facilitate reading. DjVu documents can be enriched with annotations and hyperlinks. They can also contain a searchable text version of the document.