====== DjVu: Scanned Documents on the Web. ======

[[http://djvu.org|{{:research:djvuorg.png?300 }}]]

This page describes the design of DjVu,
a document compression and imaging technology 
that allows the efficient online distribution of 
high resolution scanned documents. 
More information can be found on [[http://www.djvu.org|DjVu.org]].
You can also see the page describing [[:projects:djvulibre|DjVuLibre]], 
a free implementation of the DjVu system.
Finally, you can install the [[http://djvu.org/resources|DjVu browser plugin]],
and view the DjVu slides  [[http://leon.bottou.org/slides/djvu/djvuslides.djvu|(djvu)]][[http://leon.bottou.org/slides/djvu/djvuslides.pdf|(pdf)]].

===== Overview =====

Despite the exponential growth of the internet,
most of the human knowledge is preserved on paper
in the form of books, magazines, journals, ...
Scanning these documents offers a cost-effective 
way to put them online.
Unfortunately high-resolution color scanned images are too large
to be practical.

The main DjVu innovation is a document compression technique
that reduce these high resolution image to a size comparable
to that of a typical web page.  
For instance, DjVu needs 40KB to 60KB to represent a typical magazine 
page scanned in color with a resolution of 300 dpi.
Such large compression ratios are possible because DjVu 
understands many aspects of document images.

  * The //segmentation// step separates a //foreground image// from a //background image//.
  * The //foreground image// contains the text and the line art. High compression ratios are achieved by collecting the repeated characters into a //shape dictionary// and simply coding the position of their occurences.
  * The //background image// contains the paper texture and the photographic images. High compression ratios are achieved with simple wavelets because this part of the image can be rendered with a lesser resolution.

DjVu documents used to be viewed using a sophisticated [[http://www.djvuzone.org/download|browser plugin]]. Unfortunately, modern browsers no longer support such plugins, and you have to resort to standalone programs, such as [[https://djvu.sourceforge.net/djview4.html|DjView]], to view djvu documents.
The DjVu viewer efficiently represents high resolution images
with //limited memory// and implements very efficient //zooming and panning//.
The viewer automatically download the next pages in 
the background in order to facilitate reading.
DjVu documents can be enriched with //annotations// and //hyperlinks//.
They can also contain a //searchable text version// of the document.


===== Main Publications =====

<box 99% orange>
Léon Bottou, Patrick Haffner, Paul G. Howard, Patrice Simard, Yoshua Bengio and Yann Le Cun:  **High Quality Document Image Compression with DjVu**,  //Journal of Electronic Imaging//, 7(3):410-425, 1998.

[[:papers:bottou-98|more...]]
</box>

<box 99% orange>
Léon Bottou and Steven Pigeon:  **Lossy Compression of Partially Masked Still Images**,  //Proceedings of IEEE Data Compression Conference//, Snowbird, UT, April 1998.

[[:papers:bottou-pigeon-98|more...]]
</box>

<box 99% orange>
Léon Bottou, Paul G. Howard and Yoshua Bengio:  **The Z-Coder Adaptive Binary Coder**,  //Proceedings IEEE Data Compression Conference 1998//, IEEE, Snowbird, April 1998.

[[:papers:bottou-howard-bengio-98|more...]]
</box>

<box 99% orange>
Léon Bottou, Patrick Haffner and Yann Le Cun:  **Conversion of Digital Documents to Multilayer Raster Formats**,  //Proceedings of the Sixth International Conference on Document Analysis and Recognition//, 444-448, IEEE, Seattle, September 2001.

[[:papers:bottou-2001|more...]]
</box>

<box 99% orange>
Yann Le Cun, Léon Bottou, Andrei Erofeev, Patrick Haffner and Bill W. Riemers:  **DjVu document browsing with on-demand loading and rendering of image components**,  //Internet Imaging//, San Jose, January 2001.

[[:papers:lecun-2001|more...]]
</box>

<box 99% orange>
Patrick Haffner, Léon Bottou , Yann Le Cun and Luc Vincent:  **A General Segmentation Scheme for DjVu Document Compression**,  //Proceedings of the International Symposium on Mathematical Morphology (ISMM'02)//, CSIRO publications, Sydney, Australia, April 2002.

[[:papers:haffner-2002|more...]]
</box>