Robust Document Image Binarization Technique for Degraded Document Images
Subba Rao Nasina1, A Suman Kumar Reddy2
Citation : Subba Rao Nasina,A Suman Kumar Reddy, Robust Document Image Binarization Technique for Degraded Document Images International Journal of Innovative Research in Electronics and Communications 2015, 2(5) : 35-44.
Libraries and archives around the world store an abundance of old and historically important documents and manuscripts. These documents accumulate a significant amount of human heritage over time. Segmentation of text from badly degraded document imagesis a very challenging task due to the high inters/intravariation between the document background and the foreground text of different document images. In this paper, we propose a novel document image binarization technique that addresses these issues by using adaptive image contrast. The adaptive image contrast is a combination of the local image contrast and the local image gradient that is tolerant to text and background variation caused by different types of document degradations. In the proposed technique, an adaptive contrast map is first constructed for an input degraded document image. The contrast map is then binarized and combined with Canny's edge map to identify the text stroke edge pixels. The document text is further segmented by a local threshold that is estimated based on the intensities of detected text stroke edge pixels within a local window. The proposed method is simple, robust, and involves minimum parameter tuning. It has been tested on three public datasets that are used in the recent document image binarization contest (DIBCO) 2009 & 2011 and handwritten-DIBCO 2010 and achieves accuracies of 93.5%, 87.8%, and 92.03%, respectively that are significantly higher than or close to that of the best performing methods reported in the three contests. Experiments on the Bickley diary dataset that consists of several challenging bad quality document images also show the superior performance of our proposed method, compared with other techniques.