Open Text Corporation
MACHINE LEARNING SYSTEMS AND METHODS FOR AUTOMATICALLY TAGGING DOCUMENTS TO ENABLE ACCESSIBILITY TO IMPAIRED INDIVIDUALS

Last updated:

Abstract:

Systems, methods, and products for auto tagging structured PDF documents that do not have accessibility tags. In one embodiment, structured PDF documents having accessibility tags are first parsed and analyzed to organize the visual components of the documents. The relationships of the identified objects to DOM elements (e.g., tags) are determined, and the objects and related DOM elements are stored in training files. The training files are used to train various classifiers. Untagged PDF documents are then parsed to identify included visual objects, and the classifiers are used to determine DOM elements that should be associated with visual objects identified in the untagged PDF documents. This information is used to construct a DOM structure corresponding to each untagged document. A new PDF is then generated corresponding to each untagged document using the generated DOM structure and visual object information.

Status:
Application
Type:

Utility

Filling date:

12 Feb 2021

Issue date:

2 Sep 2021