SAP SE
Font family and size aware character segmentation

Last updated:

Abstract:

A method clusters each character on a document into one of a plurality of clusters based on widths of at least a portion of the characters on the document and measures distances between characters on the document. A threshold for each of the plurality of clusters is calculated based on at least a portion of the distances between characters in each cluster. The method then segments characters into units using the thresholds for the plurality of clusters. A distance between two characters in the document is compared to a threshold for a cluster to classify the two characters as being part of a unit when the distance is less than the threshold and not being part of the unit when the distance is greater than the threshold. Then, the method performs a recognition process on the document using the units.

Status:
Grant
Type:

Utility

Filling date:

29 Nov 2018

Issue date:

6 Apr 2021