Microsoft Corporation
Distant supervision for entity linking with filtering of noise

Last updated:

Abstract:

A technique is described herein for processing documents in a time-efficient and accurate manner. In a training phase, the technique generates a set of initial training examples by associating entity mentions in a text corpus with corresponding entity identifiers. Each entity identifier uniquely identifies an entity in a particular ontology. The technique then removes noisy training examples from the set of initial training examples, to provide a set of filtered training examples. The technique then applies a machine-learning process to generate a linking component based, in part, on the set of filtered training examples. In an application phase, the technique uses the linking component to link input entity mentions with corresponding entity identifiers. Various application systems can leverage the capabilities of the linking component, including a search system, a document-creation system, etc.

Status:
Grant
Type:

Utility

Filling date:

31 Oct 2017

Issue date:

15 Feb 2022