Robust Font Adaptive Word Recognition from Printed Document Images

Please use this identifier to cite or link to this item: http://theses.iitj.ac.in:8080/jspui/handle/123456789/75

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Harit, Gaurav	-
dc.date.accessioned	2016-09-05T13:19:09Z	-
dc.date.available	2016-09-05T13:19:09Z	-
dc.date.issued	2015-05	-
dc.identifier.citation	Rupali. (2015). Robust Font Adaptive Word Recognition from Printed Document Images (Master's thesis). Indian Institute of Technology Jodhpur, Jodhpur.	en_US
dc.identifier.uri	http://theses.iitj.ac.in:8080/jspui/handle/123456789/75	-
dc.description.abstract	A large amount of data present in books and ancient manuscripts is available as scanned images. The content search in machine readable documents like .txt, .pdf is performed by string matching operations that cannot be applied to document images. The contents of word image of a particular language can be extracted using Optical Character Recognition of that specific language. This method strictly requires clear document images so that character segmentation process doesn't give ambiguous characters. But, the document images extracted from scanned books and ancient manuscripts are not clear as most of its characters are merged or broken, however, they are human readable. This limitation of word recognition using Optical Character Recognition gives rise to a new method of word indexing in document images called Keyword Spotting. This method avoids the ambiguous character segmentation and recognition step as it extract features from complete word image and compare the images with their corresponding characteristics. The focus of our research is to find robust Keyword Spotting methods that can perform search of similar words in the data set which contains word images in widely varying fonts. The word images where words are present in different fonts are indexed such that the images with similar content as that of the query image are retrieved as top results. We have used two techniques to accomplish this task. The first technique is Self-Organizing maps where the characters with same content irrespective of the font are mapped to neurons that are close to each other in the two-dimensional neuron map. The second method extracts few interest points from each word image by applying k-means on its ink pixels, which are represented by the Scale Invariant Feature Transform descriptors. The results obtained with these techniques are found comparable to the existing approaches.	en_US
dc.description.statementofresponsibility	by Rupali	en_US
dc.format.extent	xii, 55p.	en_US
dc.language.iso	en	en_US
dc.publisher	Indian Institute of Technology Jodhpur	en_US
dc.rights	IIT Jodhpur	en_US
dc.subject.ddc	Robust Font Adaptive	en_US
dc.title	Robust Font Adaptive Word Recognition from Printed Document Images	en_US
dc.type	Thesis	en_US
dc.creator.researcher	Rupali	-
dc.date.registered	2013	-
dc.date.awarded	2015	-
dc.publisher.place	Jodhpur	en_US
dc.publisher.department	Center for Information Communication and Technology	en_US
dc.type.degree	Master of Technology (M.Tech.)	en_US
dc.format.accompanyingmaterial	CD	en_US
dc.description.note	col. ill.; including bibliography	en_US
dc.identifier.accession	TM00070	-
Appears in Collections:	M. Tech. Theses

Files in This Item:

File	Description	Size	Format
TM00070.pdf		1.62 MB	Adobe PDF	View/Open Request a copy

Show simple item record

IIT Jodhpur Theses Repository

IIT Jodhpur Theses Repository preserves and enables easy access to Ph.D., M.Tech. and M.Sc. Theses to their community.