Publications

Handwritten document retrieval strategies

Abstract

With the continuous growth of the World Wide Web, there is an urgent need for an efficient information retrieval system which can search and retrieve handwritten documents when presented with user queries. However, unconstrained handwriting recognition remains to be a challenging task with inadequate performance (around 30%, accuracy) thus proving to be a major hurdle in providing a robust search experience in the domain of handwritten documents. In this paper, we describe our recent research with focus on information retrieval from noisy text output by imperfect recognizers applied to handwritten document images. We describe three techniques each exploring a different approach for solving the noisy text retrieval task. The first method uses a novel bootstrapping mechanism to refine the OCR'ed text and uses the cleaned text for retrieval. The second method uses the uncorrected or raw OCR'ed text but …

Date
July 23, 2009
Authors
Venu Govindaraju, Huaigu Cao, Anurag Bhardwaj
Book
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Pages
3-7