Farsi and Arabic document images lossy compression based on the mixed raster content model

H Grailu, M Lotfizad, H Sadoghi-Yazdi - International Journal on Document …, 2009 - Springer
International Journal on Document Analysis and Recognition (IJDAR), 2009Springer
Recently, the mixed raster content model was proposed for compound document image
compression. Most state-of-the-art document image compression methods, such as DjVu,
work on the basis of this model but they have some disadvantages, especially for Farsi and
Arabic document images. First, the Farsi/Arabic script has some characteristics which can be
used to further improve the compression performance. Second, existing segmentation
methods have focused on well-separating the textual objects from the background and/or …
Abstract
Recently, the mixed raster content model was proposed for compound document image compression. Most state-of-the-art document image compression methods, such as DjVu, work on the basis of this model but they have some disadvantages, especially for Farsi and Arabic document images. First, the Farsi/Arabic script has some characteristics which can be used to further improve the compression performance. Second, existing segmentation methods have focused on well-separating the textual objects from the background and/or optimizing the rate-distortion trade-off; nevertheless, they have not considered the text readability and OCR facility. Third, these methods usually suffer from the undesired jaggy artifact and misclassifying the important textual details. In this paper, MRC-based document image compression method is proposed which compromises rate-distortion trade-off better than the existing state-of-the-art document compression methods. The proposed method has higher performance in the aspects of segmentation, bi-level mask layer compression, OCR facility, and the overall compression. It uses a 1D pattern matching technique for compression of mask layer. It also uses a segmentation method which is sensitive enough to the small textual objects. Experimental results show that the proposed method has considerably higher compression performance than that of the state-of-the-art compression method DjVu, as high as 1.75–2.3.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果