A robust segmentation technique for line, word and character extraction from Kannada text in low resolution display board images

SA Angadi, MM Kodabagi - International Journal of Image and …, 2014 - World Scientific
SA Angadi, MM Kodabagi
International Journal of Image and Graphics, 2014World Scientific
Reliable extraction/segmentation of text lines, words and characters is one of the very
important steps for development of automated systems for understanding the text in low
resolution display board images. In this paper, a new approach for segmentation of text
lines, words and characters from Kannada text in low resolution display board images is
presented. The proposed method uses projection profile features and on pixel distribution
statistics for segmentation of text lines. The method also detects text lines containing …
Reliable extraction/segmentation of text lines, words and characters is one of the very important steps for development of automated systems for understanding the text in low resolution display board images. In this paper, a new approach for segmentation of text lines, words and characters from Kannada text in low resolution display board images is presented. The proposed method uses projection profile features and on pixel distribution statistics for segmentation of text lines. The method also detects text lines containing consonant modifiers and merges them with corresponding text lines, and efficiently separates overlapped text lines as well. The character extraction process computes character boundaries using vertical profile features for extracting character images from every text line. Further, the word segmentation process uses k-means clustering to group inter character gaps into character and word cluster spaces, which are used to compute thresholds for extracting words. The method also takes care of variations in character and word gaps. The proposed methodology is evaluated on a data set of 1008 low resolution images of display boards containing Kannada text captured from 2 mega pixel cameras on mobile phones at various sizes 240 × 320, 480 × 640 and 960 × 1280. The method achieves text line segmentation accuracy of 97.17%, word segmentation accuracy of 97.54% and character extraction accuracy of 99.09%. The proposed method is tolerant to font variability, spacing variations between characters and words, absence of free segmentation path due to consonant and vowel modifiers, noise and other degradations. The experimentation with images containing overlapped text lines has given promising results.
World Scientific
以上显示的是最相近的搜索结果。 查看全部搜索结果