International Business Machines Corporation
Next generation sequencing sorting in time and space complexity using location integers

Last updated:

Abstract:

A system and machine-implemented method for sorting Next-Generation Sequencing (NGS) reads in O(n) time and space complexity that makes use low sparsity and nearly uniform distribution of the input array. The genome position field in the input array is used to determine the target position of the output array. Duplicate target positions due to n-fold coverage are handled by assigning either overflow buckets to each position or anterior assigning multiple target slots in the output array for each genome position depending on the distribution of reads over the genome and the resulting probability of hitting an already occupied slot. Once every tuple in the input array has been written to the output array, the output array in read through ascending order and each tuple is appended to the end of a final result array.

Status:
Grant
Type:

Utility

Filling date:

7 Dec 2017

Issue date:

23 Nov 2021