Adobe Inc.
TRAINING DIGITAL CONTENT CLASSIFICATION MODELS UTILIZING BATCHWISE WEIGHTED LOSS FUNCTIONS AND SCALED PADDING BASED ON SOURCE DENSITY

Last updated:

Abstract:

Methods, systems, and non-transitory computer readable storage media are disclosed for training a machine-learning model utilizing batchwise weighted loss functions and scaled padding based on source density. For example, the disclosed systems can determine a density of words or phrases in digital content from a digital content source that indicate an affinity towards one or more content classes. In some embodiments, the disclosed systems can use the determined source density to split digital content from the source into segments and pad the segments with padding characters based on the source density. The disclosed systems can also generate document embeddings using the padded segments and then train the machine-learning model using the document embeddings. Furthermore, the disclosed system can use batchwise weighted cross entropy loss for applying different class weightings on a per-batch basis during training of the machine-learning model.

Status:
Application
Type:

Utility

Filling date:

1 Jul 2019

Issue date:

7 Jan 2021