Wipro Limited
Apparatus and method for detecting and removing outliers using sensitivity score
Last updated:
Abstract:
A method for detecting outliers is provided, the method comprising: receiving a digitized text corpus comprising a plurality of data points; identifying k clusters of the plurality of data points; sampling a data point among the plurality of data points as a first cluster center of the k clusters; determining sampling probability of each of remaining data points of the plurality of data points; sampling the next cluster center based on the sampling probability and iterate the process of determining sampling probability and the process of sampling the next cluster center until k cluster centers are sampled; generating weightage for each of the k cluster centers; determining sensitivity scores of the data points belonging to each of the k cluster centers; and labeling a data point having a sensitivity score greater than a threshold value as an outlier and removing the outlier from the digitized text corpus.
Utility
20 Nov 2018
12 Apr 2022