Oracle Corporation
GLOBAL, MODEL-AGNOSTIC MACHINE LEARNING EXPLANATION TECHNIQUE FOR TEXTUAL DATA
Last updated:
Abstract:
A model-agnostic global explainer for textual data processing (NLP) machine learning (ML) models, "NLP-MLX", is described herein. NLP-MLX explains global behavior of arbitrary NLP ML models by identifying globally-important tokens within a textual dataset containing text data. NLP-MLX accommodates any arbitrary combination of training dataset pre-processing operations used by the NLP ML model. NLP-MLX includes four main stages. A Text Analysis stage converts text in documents of a target dataset into tokens. A Token Extraction stage uses pre-processing techniques to efficiently pre-filter the complete list of tokens into a smaller set of candidate important tokens. A Perturbation Generation stage perturbs tokens within documents of the dataset to help evaluate the effect of different tokens, and combinations of tokens, on the model's predictions. Finally, a Token Evaluation stage uses the ML model and perturbed documents to evaluate the impact of each candidate token relative to predictions for the original documents.
Utility
11 Jan 2021
21 Jul 2022