Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 74 countries worldwide.
Header Based Classification of Journals Using Document Image Segmentation and Extreme Learning Machine
Kalpana S, Vijaya MS
Pages - 245 - 254     |    Revised - 10-07-2014     |    Published - 10-08-2014
Volume - 8   Issue - 5    |    Publication Date - September / October 2014  Table of Contents
Classification, Document Segmentation, Feature Extraction, Extreme Learning Machine.
Document image segmentation plays an important role in classification of journals, magazines, newspaper, etc., It is a process of splitting the document into distinct regions. Document layout analysis is a key process of identifying and categorizing the regions of interest in the scanned image of a text document. A reading system requires the segmentation of text zones from non- textual ones and the arrangement in their correct reading order. Detection and labelling of text zones play different logical roles inside the document such as titles, captions, footnotes, etc. This research work proposes a new approach to segment the document and classify the journals based on the header block. Documents are collected from different journals and used as input image. The image is segmented into blocks like heading, header, author name and footer using Particle Swarm optimization algorithm and features are extracted from header block using Gray Level Co-occurrences Matrix. Extreme Learning Machine has been used for classification based on the header blocks and obtained 82.3% accuracy.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
1 Okun O. Doermann D and M. Pietikainen. “Page segmentation and zone classification”. The state of the art. In UMD, 1999.
2 Yuan. Y. Tang and M. Cheriet, Jiming Liu, J.N Said, “Document Analysis and recognition by computers”.
3 L. O. Gorman, “The document spectrum for page layout analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence 15, pp. 1162–1173, 1993.
4 K. Kise, A. Sato, and M. Iwata, “Segmentation of page images using the area Voronoi diagram,”Computer Vision and Image Understanding 70, pp. 370–382, 1998.
5 Wahl. K. Wong, and R. Casey, “Block segmentation and text extraction in mixed text/image documents,” Graphical Models and Image Processing 20, pp. 375–390, 1982.
6 L. A. Fletcher and R. Kasturi, “A robust algorithm for text string separation from mixed text/graphics images,” IEEE Transactions on Pattern Analysis and Machine Intelligence 10, pp. 910–918, 1988.
7 Nagy, S. Seth, and M. Viswanathan, “A prototype document image analysis system for technical journals,” Computer 25, pp. 10–22, 1992.
8 S. Baird, S. E. Jones, and S. J. Fortune, “Image segmentation by shape-directed covers,” in Proceedings of International Conference on Pattern Recognition, pp. 820–825, (Atlantic City, NJ), June 1990.
9 T. Pavlidis and J. Zhou, “Page segmentation and classification,” Graphical Models and Image Processing 54, pp. 484–496, 1992.
10 Haralick R.M., Shanmugam K., Dinstein I., “Textural Features for Image Classification”, IEEE Trans.on System Man and Cybernetics, 1973, 3(6), p.610-621.
11 Santanu Chaudhury, Megha Jindal, and Sumantra Dutta Roy, “Model-Guided Segmentation and Layout Labeling of Document Images using a Hierarchical Conditional Random Field”, New Delhi,India.
12 Jianying Hu, Ramanujan Kashi, Gordon Wilfong, “Document Classification using Layout Analysis”,USA.
13 Gerd Maderlechner, Angela Schreyer and Peter Suda, “Information Extraction from Document Images using Attention Based Layout Segmentation”, Germany.
14 Y. Ishitani. Document layout analysis based on emergent computation. Proc. 4th ICDAR, 1:45–50,1997.
15 K. T. Spoehr. Visual information processing. W. H. Freeman and Company, 1982.
16 Robert M. Haralick,”Document image Understanding: Geometric and Logical layout”, University of Washington, Seattle.
17 ISO: 8613: Information Processing-Text and Office Systems-Office, Document Architecture (ODA) and Interchange Format, International Organization for Standardization, 1989.
18 Y. Ishitani. Logical structure analysis of document images based on emergent computation. Proc. 5th ICDAR, 1999.
19 Esposito, F., Malerba, D., Francesca, Lisi, F.A., Ras, W.: Machine learning for intelligent processing of printed documents. Journal of Intelligent Information Systems 14 (2000) 175–198.
20 M. Krishnamoorthy, G. Nagy, S. Seth, and M. Viswanathan, “Syntactic segmentation and labeling of digitized pages from technical journals,” IEEE Transactions on Pattern Analysis and Machine Intelligence 15, pp. 737–747, 1993.
Miss Kalpana S
Research Scholar PSGR Krishnammal College for Women Coimbatore, India. - India
Miss Vijaya MS
Associate Professor PSGR Krishnammal College for Women Coimbatore, India. - India