International Business Machines Corporation
Sharing intermediate data in map-reduce
Last updated:
Abstract:
One embodiment provides a method, including: receiving a plurality of data for job processing, wherein the job processing processes the plurality of data into (i) at least one map phase and (ii) at least one reduce phase; generating a plurality of key-value groups from the plurality of data, wherein the plurality of key-value groups are grouped from data pairs including a key and a value and wherein each of the key-value groups include a grouping of data pairs having a common key and a plurality of values associated with the common key; identifying values common to at least a subset of the key-value groups; generating, based upon the identifying, new key-value groups, wherein at least a subset of the new key-value groups includes key-value groups having common keys and the identified common values; and communicating the new key-value groups to the at least one reduce function for processing.
Utility
29 Mar 2017
24 Aug 2021