International Business Machines Corporation
AUTO-GENERATING GROUND TRUTH ON CLINICAL TEXT BY LEVERAGING STRUCTURED ELECTRONIC HEALTH RECORD DATA

Last updated:

Abstract:

A method improves performance of natural language processing by automatically generating ground truth from electronic health records comprising unstructured clinical notes and structured data comprising entries each having respective values for fields. The method includes: linking a given one of the notes to a given one of the entries responsive to determining that a specified field within the given entry matches an item of metadata for the given note; determining an initial set of the notes which satisfy criteria selected such that the criteria are a proxy for the ground truth, wherein the given note is determined to satisfy the criteria based at least in part on the given entry linked thereto; and designating at least a portion of the initial set of notes which satisfy the criteria, and the entries linked to the portion of the initial set of notes which satisfy the criteria, as the ground truth.

Status:
Application
Type:

Utility

Filling date:

10 Mar 2020

Issue date:

16 Sep 2021