Rapid7, Inc.
Detection of outliers in text records

Last updated:

Abstract:

Systems and methods are disclosed to implement an outlier detection system for text records. In embodiments, the detection system generates a fingerprint for each incoming record so that similar records map to similar fingerprints. Each record is assigned to a closest cluster in a set of clusters based computed distances between on the record's fingerprint and respective cluster fingerprints of the clusters. The cluster fingerprint is dynamically updated to maintain respective a representative fingerprint of its member records. When a new record is received that is not sufficiently close to any cluster, a new cluster is added to the set for the new record. In embodiments, the creation of the new cluster triggers an alert that the new record is a potential outlier. Advantageously, the disclosed detection system can be used to detect outliers in records in near real time, without the need to pre-specify outlier characteristics.

Status:
Grant
Type:

Utility

Filling date:

31 Dec 2019

Issue date:

3 Aug 2021