Morgan Stanley
High-compression, high-volume deduplication cache

Last updated:

Abstract:

A method for caching and deduplicating a plurality of received segments of data is disclosed. The method comprises identifying a value of a first data field in each segment acting as a unique source identifier; and identifying a value of a second data field in each segment, the second data field being densely populated by values in the plurality of segments. The value of the second data field is partitioned into a first partition comprising more significant bits and a second partition comprising less significant bits. A key is generated based on values of the first data field and the first partition. A database entry associates the first key with a bitmap, the bitmap having a length based on the number of possible values a bitmap of equal length to the second partition could validly take. Single bits of the bitmap are set corresponding to received segments, to enable deduplication.

Status:
Grant
Type:

Utility

Filling date:

15 Oct 2021

Issue date:

23 Aug 2022