Publications
Image registration and text recognition for structured census documents
Abstract
In this paper, we present our work on developing a system for registration and recognition of structured census documents. Information extraction from these documents present many challenges, for instance, table registration, cell extraction, binarization, and recognition of handwritten text. This paper mainly deals with table registration. It details the approach and algorithms we developed for unsupervised registration of tables given a set of templates. The algorithm is also capable of detecting the presence of a template in a page before proceeding to register it. No restrictions are placed on the position or the size of the table in a page in comparison to those of the template and are robust to skew and minor amounts of non-linear distortions in the scanned page. We then proceed to outline our overall system for information extraction from tabular pages using the BBN Byblos Optical Handwriting Recognition (OHR) system. We present preliminary results for table registration using our approach.
- Date
- January 1, 1970
- Authors
- Krishna Subramanian, Huaigu Cao, Xujun Peng, Rohit Prasad, Prem Natarajan
- Journal
- 12th Annual Workshop on Family History Technology