Illumina, Inc.
SEQUENCE-GRAPH BASED TOOL FOR DETERMINING VARIATION IN SHORT TANDEM REPEAT REGIONS
Last updated:
Abstract:
The disclosed embodiments concern methods, apparatus, systems and computer program products for genotyping repeat sequences such as medically significant short tandem repeats (STRs). The methods involve aligning reads to a repeat sequence represented by a sequence graph, and using the aligned reads to genotype the repeat sequence. The sequence graph is a directed graph each including at least one self-loop representing a repeat sub-sequence. In some implementations, the reads are paired end reads, and both mates of each read pair may be used to genotype the repeat sequences. Some implementations can be used to determine degenerate codon repeats. Some implementations can be used to genotype repeat sequences each including two or more repeat sub-sequences. Some implementations can be used to genotype nucleic acid sequences each including at least one repeat sub-sequence and another genetic variant such as an insertion, deletion, or substitution.
Utility
6 Mar 2020
10 Sep 2020