paperKB
coga / coga-kb
Help
Sign in

Chunk #20 — Beyond SNVs and indels

Source
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.
Embedded
yes

Text

In total, we placed 1,017 ancestral sequences, of which we were able to fully resolve 713, ranging in length from 100 bp to 39 kb (N50 = 1,183), and accounting for a total of 528,233 bp (Fig. 3a). We partially resolved 304 events, for which we assembled part of the ancestral sequence but could place only one breakpoint on the reference sequence (see Supplementary Information 1.7). Out of all 1,017 events, 551 (54.18%) occur within GENCODE v.2915 genes (a proportion that is not significantly different from 54.80% of the current reference genome GRCh38 that is within genes). The assembled sequences contain repetitive motifs at a significantly higher rate than the genome as a whole (58.2% versus 50.1%) (Supplementary Tables 8–10). There is a strong overrepresentation of simple and low complexity sequences both in the reference breakpoints and within the bodies of the non-reference sequences, which could be indicative of the instability of these motifs and/or errors in the reference.