Royal Bank of Canada
SYSTEM AND METHOD FOR DETECTING DATA DRIFT
Last updated:
Abstract:
Data drift or dataset shift is detected between training dataset and test dataset by training a scoring function using a pooled dataset, the pooled dataset including a union of the training dataset and the test dataset; obtaining an outlier score for each instance in the training dataset and the test dataset based at least in part on the scoring function; assigning a weight to each outlier score based at least in part on training contamination rates; determining a test statistic based at least in part on the outlier scores and the weights; determining a null distribution of no dataset shift for the test statistic; determining a threshold in the null distribution; and when the test statistic is greater than or equal to the threshold, identifying dataset shift between the training dataset and the test dataset.
Utility
26 Jun 2020
31 Dec 2020