Adobe Inc.
Method to identify and extract fragments among large collections of digital documents using repeatability and semantic information

Last updated:

Abstract:

A corpus of documents is processed using, for example, algorithms including deep learning and deep neural networks ("DNN"), to extract fragments across the corpus of documents. The extracted fragments can then be edited individually and referenced by a plurality of documents so that changes to the fragments are reflected universally across a corpus of documents efficiently. In one example case, a computer-implemented method is provided for extracting fragments in a digital document. The method includes indexing said document to generate a document element ID sequence; processing said document element ID sequence to generate at least one fragment candidate; processing said at least one fragment candidate to generate at least one respective fragment; and utilizing said at least one fragment to perform a reconstruction of said document.

Status:
Grant
Type:

Utility

Filling date:

11 Oct 2017

Issue date:

22 Dec 2020