Efficient Conversion of Digital Documents to Multilayer Raster Formats

Abstract: How can we turn the description of a digital (i.e., electronically produced) document into something efficient for multilayer raster formats? It is first shown that a foreground/background segmentation without overlapping foreground components can be more efficient for viewing or printing. Then, a new algorithm that prevents overlaps between foreground components while optimizing both the document quality and compression ratio is derived from the Minimum Description Length (MDL) criterion. This algorithm makes the DjVu compression format significantly more efficient on electronically produced documents. Comparisons with other formats are provided.

Léon Bottou, Patrick Haffner and Yann Le Cun: Efficient Conversion of Digital Documents to Multilayer Raster Formats, Proceedings of the Sixth International Conference on Document Analysis and Recognition, 444-448, IEEE, Seattle, September 2001.

icdar-2001.djvu icdar-2001.pdf icdar-2001.ps.gz

@inproceedings{bottou-2001,
  author = {Bottou, L\'{e}on and Haffner, Patrick and{Le Cun}, Yann Cun}},
  title = {Conversion of Digital Documents to Multilayer Raster Formats},
  booktitle = {Proceedings of the Sixth International Conference on Document Analysis and Recognition},
  month = {September},
  year = {2001},
  pages = {444-448},
  address = {Seattle},
  publisher = {IEEE},
  url = {http://leon.bottou.org/papers/bottou-2001},
}