Pacific Biosciences of California, Inc.
SYSTEMS AND METHODS FOR GRAPH BASED MAPPING OF NUCLEIC ACID FRAGMENTS
Last updated:
Abstract:
Technical solutions for mapping long nucleic acid sequence reads to a target sequence are provided. A directed graph, representing all or some of a genome and comprising one or more nonlinear topological components, is obtained for an organism having a heterozygous genome. Each nonlinear topological component has an initiating node and a terminal node connected by at least a first branch and a second branch. One of these branches corresponds to the target sequence. The directed graph uses a plurality of sequence reads from a biological sample of the organism. The sequence reads are overlapped by an unrestricted overhang amount, provided there is a minimum consensus region between each two sequence reads. A query sequence, encompassing at least the initiating node or the terminal node of a first nonlinear topological component, is obtained. The directed graph is used to form a mapping of the query sequence to the directed graph.
Utility
24 Jan 2020
30 Jul 2020