DjVu document browsing with on-demand loading and rendering of image components

Abstract: Image-based digital documents are composed of multiple pages, each of which may be composed of multiple components such as the text, pictures, background, and annotations. We describe the image structure and software architecture that allows the DjVu system to load and render the required components on demand while minimizing the bandwidth requirements, and the memory requirements in the client. DjVu document files are merely a list of enriched URLs that point to individual files (or file elements) that contain image components. Image components include: text images, background images, shape dictionaries shared by multiple pages, OCR-ed text, and several types of annotations. A multithreaded software architecture with smart caching allows individual components to be loaded and pre-decoded and rendered on-demand. Pages are pre-fetched or loaded on demand, allowing users to randomly access pages without downloading the entire document, and without the help of a byte server.

Yann Le Cun, Léon Bottou, Andrei Erofeev, Patrick Haffner and Bill W. Riemers: DjVu document browsing with on-demand loading and rendering of image components, Internet Imaging, San Jose, January 2001.

spie-2001.djvu spie-2001.pdf spie-2001.ps.gz

@inproceedings{lecun-2001,
  author = {{Le Cun}, Yann and Bottou, L\'{e}on and Erofeev, Andrei and Haffner, Patrick and Riemers, Bill W.},
  title = {{DjVu} document browsing with on-demand loading and rendering of image components},
  booktitle = {Internet Imaging},
  address = {San Jose},
  month = {January},
  year = {2001},
  url = {http://leon.bottou.org/papers/lecun-2001},
}