International Business Machines Corporation
Accelerating shared file checkpoint with local burst buffers
Last updated:
Abstract:
A data management system and method for accelerating shared file checkpointing. Written application data is aggregated in an application data file created in a local burst buffer memory at a compute node, and an associated data mapping built index to maintain information related to the offsets into a shared file at which segments of the application data is to be stored in a parallel file system, and where in the buffer those segments are located. The node asynchronously transfers a data file containing the application data and the associated data mapping index to a file server for shared file storage. The data management system and method further accelerates shared file checkpointing in which a shared file, together with a map file that specifies how the shared file is to be distributed, is asynchronously transferred to local burst buffer memories at the nodes to accelerate reading of the shared file.
Utility
26 Apr 2018
12 Apr 2022