====== DjVu: Scanned Documents on the Web. ====== [[http://djvu.org|{{:research:djvuorg.png?300 }}]] This page describes the design of DjVu, a document compression and imaging technology that allows the efficient online distribution of high resolution scanned documents. More information can be found on [[http://www.djvu.org|DjVu.org]]. You can also see the page describing [[:projects:djvulibre|DjVuLibre]], a free implementation of the DjVu system. Finally, you can install the [[http://djvu.org/resources|DjVu browser plugin]], and view the [[http://leon.bottou.org/slides/djvu|DjVu slides]]. ===== Overview ===== Despite the exponential growth of the internet, most of the human knowledge is preserved on paper in the form of books, magazines, journals, ... Scanning these documents offers a cost-effective way to put them online. Unfortunately high-resolution color scanned images are too large to be practical. The main DjVu innovation is a document compression technique that reduce these high resolution image to a size comparable to that of a typical web page. For instance, DjVu needs 40KB to 60KB to represent a typical magazine page scanned in color with a resolution of 300 dpi. Such large compression ratios are possible because DjVu understands many aspects of document images. * The //segmentation// step separates a //foreground image// from a //background image//. * The //foreground image// contains the text and the line art. High compression ratios are achieved by collecting the repeated characters into a //shape dictionary// and simply coding the position of their occurences. * The //background image// contains the paper texture and the photographic images. High compression ratios are achieved with simple wavelets because this part of the image can be rendered with a lesser resolution. DjVu documents can be viewed using a sophisticated [[http://www.djvuzone.org/download|browser plugin]]. The DjVu viewer efficiently represents high resolution images with //limited memory// and implements very efficient //zooming and panning//. The viewer automatically download the next pages in the background in order to facilitate reading. DjVu documents can be enriched with //annotations// and //hyperlinks//. They can also contain a //searchable text version// of the document. ===== Main Publications ===== Léon Bottou, Patrick Haffner, Paul G. Howard, Patrice Simard, Yoshua Bengio and Yann Le Cun: **High Quality Document Image Compression with DjVu**, //Journal of Electronic Imaging//, 7(3):410-425, 1998. [[:papers:bottou-98|more...]] Léon Bottou and Steven Pigeon: **Lossy Compression of Partially Masked Still Images**, //Proceedings of IEEE Data Compression Conference//, Snowbird, UT, April 1998. [[:papers:bottou-pigeon-98|more...]] Léon Bottou, Paul G. Howard and Yoshua Bengio: **The Z-Coder Adaptive Binary Coder**, //Proceedings IEEE Data Compression Conference 1998//, IEEE, Snowbird, April 1998. [[:papers:bottou-howard-bengio-98|more...]] Léon Bottou, Patrick Haffner and Yann Le Cun: **Conversion of Digital Documents to Multilayer Raster Formats**, //Proceedings of the Sixth International Conference on Document Analysis and Recognition//, 444-448, IEEE, Seattle, September 2001. [[:papers:bottou-2001|more...]] Yann Le Cun, Léon Bottou, Andrei Erofeev, Patrick Haffner and Bill W. Riemers: **DjVu document browsing with on-demand loading and rendering of image components**, //Internet Imaging//, San Jose, January 2001. [[:papers:lecun-2001|more...]] Patrick Haffner, Léon Bottou , Yann Le Cun and Luc Vincent: **A General Segmentation Scheme for DjVu Document Compression**, //Proceedings of the International Symposium on Mathematical Morphology (ISMM'02)//, CSIRO publications, Sydney, Australia, April 2002. [[:papers:haffner-2002|more...]]