International Business Machines Corporation
Optical character recognition error correction model

Last updated:

Abstract:

Embodiments relate to an intelligent computer platform to create a document specific error correction model for amending OCR values. An image of a document is received and OCR is applied to the received image. Text is extracted from at least one static content field and the extracted text is compared to stored text from known static content. Responsive to a deviation identified in the comparison, a document specific error correction model is created and leveraged to correct OCR output. The model generates one or more variants for the dynamic content field associated with the static content field having the identified deviation. The generated variant(s) is subject to processing and one of the variants is selected as amended document content. A new document version is created from the amendment, the new document version including corrected OCR output.

Status:
Grant
Type:

Utility

Filling date:

4 Dec 2019

Issue date:

17 Aug 2021