Publications
Confusion network based recurrent neural network language modeling for chinese ocr error detection
Abstract
This paper presents a new framework for OCR error detection, which uses a conditional random field model to combine rich features from multiple sources, including confusion networks (c-nets), lexical local context and recurrent neural network language model (RNNLM)1. We propose a novel, efficient method for computing character-level c-net based RNNLM scores by using dynamic programming and c-net partial unfolding. Our experiments show that our error detection model has consistent observable improvements over a high baseline employed by our current OCR demo system, as measured by average precision and detection error trade-off curve on two test sets of Chinese image documents. Both linguistic and recognition features contribute to the high performance, with the former especially informative. In addition, we show that the new feature we proposed, the c-net RNNLM feature, plays a remarkable …
- Date
- August 24, 2014
- Authors
- Jinying Chen, Yue Wu, Huaigu Cao, Prem Natarajan
- Conference
- 2014 22nd International Conference on Pattern Recognition
- Pages
- 1266-1271
- Publisher
- IEEE