The Region Analysis of Document Images Based on One Dimensional Median Filter

;;;

Journal of the Institute of Electronics Engineers of Korea SP (대한전자공학회논문지SP)

Volume 40 Issue 3
/
Pages.194-202
/
2003
/
1229-6384(pISSN)

The Institute of Electronics and Information Engineers (대한전자공학회)

The Region Analysis of Document Images Based on One Dimensional Median Filter

1차원 메디안 필터 기반 문서영상 영역해석

박승호 (경북대학교 전자·전기·컴퓨터학부) ;
장대근 (경북대학교 전자·전기·컴퓨터학부) ;
황찬식 (한국전자통신연구원)

Published : 2003.05.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

To convert printed images into electronic ones automatically, it requires region analysis of document images and character recognition. In these, regional analysis segments document image into detailed regions and classifies thee regions into the types of text, picture, table and so on. But it is difficult to classify the text and the picture exactly, because the size, density and complexity of pixel distribution of some of these are similar. Thu, misclassification in region analysis is the main reason that makes automatic conversion difficult. In this paper, we propose region analysis method that segments document image into text and picture regions. The proposed method solves the referred problems using one dimensional median filter based method in text and picture classification. And the misclassification problems of boldface texts and picture regions like graphs or tables, caused by using median filtering, are solved by using of skin peeling filter and maximal text length. The performance, therefore, is better than previous methods containing commercial softwares.

인쇄문서를 전자문서로 자동변환하기 위해서는 문서영상 영역해석과 문자인식 기술이 필요하다. 이들 중 영역해석은 문서영상을 세부 영역으로 분할하고, 분할한 영역을 문자, 그림, 표 등의 형태로 분류한파. 그러나 문자와 그림의 일부는 크기, 밀도, 화소분포의 복잡도가 비슷하여 정확한 분류가 어렵다. 따라서 영역해석에서의 오 분류는 자동변환을 어렵게 만드는 주된 원인이 된다. 본 논문에서는 분서영상을 문자와 그림영역으로 분할하는 영역해석 방법을 제안한다. 문자와 그림의 분류는 1차원 메디안 필터링을 기반으로 한 방법을 이용하여 언급한 문제점을 해결한다. 또한 메디안 필터링에 의해 발생하는 볼드체 문자와 그래프나 표와 같은 그림영역의 오 분류 문제를 표피 제거 필터와 문자의 최대크기를 이용하여 해결한다. 따라서 상용제품을 포함한 기존의 영역해석 방법보다 그 성능이 우수하다.

Keywords

References

X. LI, W. Gao, S. Y. Chi, K. A. Moon and H. J. Kim, 'An Efficient Method for Page Segmentation,' Proc. ICICS, vol.2, pp. 957-961, 1997 https://doi.org/10.1109/ICICS.1997.652121
D. Drivas and A. Amin, 'Page Segmentation and Classification Utilizing Bottom-up Approach,' Proc. ICDAR, pp. 610-614, 1995
L. O'Gorman, 'The Document Spectrum for Page Layout Analysis,' IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.15, No.11, pp. 1162-1172, Nov. 1993 https://doi.org/10.1109/34.244677
N. Papamarkos, J. Tzorttzakis and B. Gatos, 'Determination of Run-Length smoothing values for document segmentation,' IEEE 3th Int. Cont. on Electronics, Circuits and Systems, ICECS, pp. 684-687, 1996 https://doi.org/10.1109/ICECS.1996.584454
S. K. Yip and Z. Chi, 'Page Segmentation and Content Classification for Automatic Document Image Processing,' Proc. Int. Symp. Intelligent Multimedia, Video and Speech Processing, pp. 279-282, 2001 https://doi.org/10.1109/ISIMP.2001.925388
J. Kong and Z. Chi, 'Image Classification Using Kolmogorov Complexity Measure with Extracted Blocks,' IEICE Trans. Inf & Syst., Vol.1, E81-D, pp. 1239-1246, 1998
S. W. Lee and D. S. Ryu, 'Parameter-Free Geometric Document Layout Analysis,' IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.23, No.11, Nov. 2001 https://doi.org/10.1109/34.969115

Journal of the Institute of Electronics Engineers of Korea SP (대한전자공학회논문지SP)

The Region Analysis of Document Images Based on One Dimensional Median Filter

1차원 메디안 필터 기반 문서영상 영역해석

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)