Palantir Technologies Inc.
OUTPUT VALIDATION OF DATA PROCESSING SYSTEMS
Last updated:
Abstract:
A method is provided for output validation of data processing systems, performed by one or more processors. The method comprises aggregating at least a portion of a first data table, which is an output of a data pipeline of a first data processing system, into a first aggregated data table; aggregating at least a portion of a second data table, which is an output of a data pipeline of a second data processing system, into a second aggregated data table; the second data processing system being designed to perform essentially a same functionality as the first data processing system; performing a data comparison between the first aggregated data table and the second aggregated data table to obtain a data differentiating table; performing a schema comparison between the first aggregated data table and the second aggregated data table to obtain a schema differentiating table; generating a summary from the data differentiating table and the schema differentiating table; and deriving a value from the summary that indicates a similarity between the output of the data pipeline of the first data processing system and the output of the data pipeline of the second data processing system.
Utility
7 Oct 2020
17 Feb 2022