International Business Machines Corporation
Dataset origin anonymization and filtration
Last updated:
Abstract:
Embodiments also include a method for filtering and securing content of datasets in computer readable form designated for release to reduce discernable inferences therein. The method includes receiving a first dataset having first records associated with a quasi-identifier. The first records have respective first data values associated with the quasi-identifier. The method includes receiving a second dataset having second records associated with the quasi-identifier. The second records have respective second data values associated with the quasi-identifier. The method includes defining a first cluster having a first boundary based on a combination of the first dataset and the second dataset. The method includes replacing a first one of the first data values with the first boundary and a second one of the second data values with the first boundary.
Utility
11 Mar 2020
4 Jan 2022