UiPath Inc.
System and method for data augmentation for document understanding
Last updated:
Abstract:
A system, method and a computing device for performing a method for data augmentation allowing for document classification of a plurality of documents are disclosed. The system, method and computing device including a processor configured to convert the plurality of documents into images, a memory configured to store the images, the processor configured to obtain a vector representation for each page included in the plurality of documents, the processor configured to create a plurality of clusters from the images based on similarity, where each cluster of the plurality of clusters represents a distinct page format, the processor configured to select one image from each cluster of the plurality of clusters, the processor configured to compile the selected one image from each cluster of the plurality of clusters to create a logically complete document, the memory configured to store the logically complete document, and the processor configured to train the classification based on the complete document.
Utility
23 Mar 2020
23 Sep 2021