A comparison of BeadChip and WGS genotyping outputs using partial validation by sanger sequencing.

paper Cited Public

Authors: Danilov, Kirill A; Nikogosov, Dimitri A; Musienko, Sergey V; Baranova, Ancha V
Year: 2020
Journal: BMC genomics
PMID: 32912136
DOI: 10.1186/s12864-020-06919-x
PMCID: PMC7488117

Fig. 1

Whole genome depth of coverage distributions. Metrics for sample_001 (a), sample_002 (b), sample_003 (c) and breadth of coverage for the specified depth thresholds (d) averaged for all three samples are shown with 95% confidence intervals, n = 3

LLM interpretation

This figure consists of three histograms (A, B, C) showing the distribution of whole genome depth of coverage for three separate samples, with observation frequency on the y-axis and depth of coverage on the x-axis. All three samples exhibit a similar peak distribution centered around 30x coverage. Panel D is a bar chart showing the percentage of bases exceeding specific depth thresholds (5x to 30x), demonstrating a decrease in the percentage of bases as the depth threshold increases, including 95% confidence intervals for n=3.

Fig. 2

Fractions of discordant results for three samples. Percentage of discordant results per each chromosome is shown where applicable

LLM interpretation

This grouped bar chart displays the percentage of discordant results across chromosomes for three samples (sample_001, sample_002, and sample_003). The x-axis lists chromosomes 1–22, X, Y, and MT, while the y-axis measures "Discordance, %" from 0 to 25. Discordance remains low and relatively stable across chromosomes 1–22, with a significant increase observed in the mitochondrial (MT) region, where sample_001 shows the highest discordance at approximately 25%.

Fig. 3

Distance maps for the analyzed samples. a — sample_001, b — sample_002, c — sample_003, concordant and discordant variants are marked in green and orange, respectively

LLM interpretation

This figure consists of three scatter plots with overlaid density contours (A, B, and C) representing distance maps for three different samples. The x-axis shows "Distance before, $\log_{10}(\text{bp})$" and the y-axis shows "Distance after, $\log_{10}(\text{bp})$." In each plot, concordant variants are clustered in green/cyan at lower distance values, while discordant variants are clustered in orange at higher distance values.

Fig. 4

Confusion matrices calculated for the call sets obtained by WGS and BeadChip. WGS was defined as “true” call set, BeadChip — “test” call set, data is shown for sample_001 (male, chromosomes MT, X, Y were excluded from analysis), sample_002 (female, chromosome MT was excluded from analysis) and sample_003 (male, chromosomes MT, X, Y were excluded from analysis)

LLM interpretation

This figure consists of three confusion matrices comparing genotype call sets from BeadChip (test) against WGS (true) for three samples (sample_001, sample_002, and sample_003). The x-axis represents the WGS call set and the y-axis represents the BeadChip call set, with both axes labeled with genotypes (A/A, A/B, B/B, A/C, B/C, C/C). The heatmaps show a strong diagonal trend, indicating high agreement between the two methods, with the highest counts concentrated in the matching genotype cells.

Fig. 5

BeadChip genotyping quality metrics with highlighted Sanger-validated variants. Theta, R, GC Score values for sample_002 are shown; histograms show the corresponding distributions of plotted metrics in a 1-dimensional space; concordant and discordant variants are marked in blue and orange, respectively; genotypes which are not consistent with Sanger sequencing in both WGS and BeadChip results are marked with a star, matches between Sanger and BeadChip are marked with triangles, matches between Sanger and WGS are marked with circles, variants which were not successfully genotyped by Sanger are marked with crosses

LLM interpretation

This figure consists of two scatter plots with marginal histograms showing BeadChip genotyping quality metrics (GC score vs. Theta value and R value vs. Theta value) for sample_002. The plots display a distribution of concordant (blue) and discordant (orange) variants, with specific variants highlighted by symbols (stars, triangles, circles, and crosses) to indicate validation status against Sanger sequencing and WGS. The marginal histograms illustrate the 1-dimensional distribution of the Theta, GC score, and R value metrics.

Fig. 6

Example calculation of confusion matrices. The shown dimensionality reduction is used for accuracy and other metrics calculation; a, b, c, d, …, ai, aj — sample counts of each class; A/A, A/B, B/B, A/C, B/C, C/C — diploid genotypes observed in data; A — reference allele, B and C — alternative alleles. TRUTH — a call set produced by an orthogonal method (comparator), TEST — a call set produced by a test method

LLM interpretation

This figure consists of two diagrams illustrating the calculation of confusion matrices for genotype calls. Each panel shows a large matrix comparing a "TEST" method against a "TRUTH" method across six diploid genotypes (A/A, A/B, B/B, A/C, B/C, C/C), with individual cells representing sample counts. Arrows indicate how specific cells from the large matrix are aggregated into smaller 2x2 binary confusion matrices to calculate accuracy and other metrics for a single genotype.

Fig. 7

Quality metrics calculation for the initial and the “reduced confusion” matrices. Each metric is calculated as a ratio of blue elements to orange-outlined elements; A/A, A/B, B/B, A/C, B/C, C/C — diploid genotypes observed in data; A — reference allele, B and C — alternative alleles; N/N — any diploid genotype category (A/A, A/B, B/B, A/C, B/C, C/C). TRUTH — a call set produced by an orthogonal method (comparator), TEST — a call set produced by a test method

LLM interpretation

This figure consists of several diagrams illustrating the calculation of quality metrics using confusion matrices. The top row shows three $6 \times 6$ matrices comparing "TRUTH" and "TEST" diploid genotypes (A/A through C/C) to define genotype concordance and non-reference genotype sensitivity/concordance. The bottom row displays four $2 \times 2$ matrices that simplify genotypes into "N/N" and "other" categories to calculate sensitivity, specificity, precision, and accuracy. Blue shading indicates the elements used in the numerator for each specific metric calculation.

#	Section	Preview
0	Background	Both Whole Genome (WGS) and Whole Exome sequencing (WES) are now used in multiple avenues of…
1	Background	BeadChip genotyping is an efficient and scalable way of genotype resolution, with two inherent…
2	Results — Sequencing statistics	Table 1 lists the sequencing statistics for the three sequenced samples. FastQC reports are…
3	Results — Mapping statistics	All the data produced by WGS were analyzed for their depth (DOC) and breadth (BOC) of coverage using…
4	Results — Concordance metrics	Percentages of discordant calls per chromosome for each sample are shown in Fig. 2. The average…
5	Results — Mapping of analyzed variants	For each pair of BeadChip-genotyped neighboring variants, distance intervals were extracted within…
6	Results — Mapping of analyzed variants	Overall randomness of the locations of discordant genotypes across the genome, measured as cluster…
7	Results — Concordance analysis	Calculated confusion matrices for all three analyzed DNA samples are shown in Fig. 4, with metrics…
8	Results — Genotyping quality metrics distributions	For all three samples, the distributions of the WGS genotyping metrics were analyzed and compared…
9	Results — Genotyping quality metrics distributions	of R, Theta and GC scores for both concordant and discordant variants revealed a pattern of…
10	Results — Sanger sequencing	The sliding window approach performed on sample_002 resulted in the mapping of 6 regions containing…
11	Results — Sanger sequencing	and BeadChip platforms.Selected SNPs are located close to each other within 500 bp window length…
12	Results — Sanger sequencing	The list of designed primers with respective amplification parameters can be found in Table 5. The…
13	Results — Sanger sequencing	26 listed in Table 6 were used for confusion matrices calculation using Sanger-derived calls as a…
14	Discussion	Although genotype concordance analysis experiments using different sequencing and genotyping…
15	Discussion	the array pipelines is based on marker clustering in a 2-dimensional space (clusters A/A, A/B, B/B),…
16	Conclusions	Here we show the presence of some parametric differences in quality metrics of genotyping performed…
17	Methods — Materials	Three human genomic DNA samples, two males and one female, were selected for this comparison. After…
18	Methods — BeadChip genotyping	Infinium iSelect 24 × 1 HTS Custom Beadchip Kit (GSAsharedCUSTOM_20018389_A2) genotyping was…
19	Methods — Genome sequencing and variant calling	Whole genome sequencing was performed by MedGenome (CA, USA) using the HiSeq X Ten platform…

Citation	PMID	DOI	Status
Beck, TF et al., Clin Chem, 2016, Systematic evaluation of sanger validation of next-generation sequencing variants	26847218	10.1373/clinchem.2015.249623	Cited
Bolger, AM et al., Bioinformatics., 2014, Trimmomatic: a flexible trimmer for Illumina sequence data	24695404	10.1093/bioinformatics/btu170	Cited
Broad Institute. GATK Tools; (version 3.8). Available from: http://github.com/broadinstitute/gatk/. Accessed 5 Mar 2018.	—	—	—
Gargis, AS et al., Nat Biotechnol, 2012, Assuring the quality of next-generation sequencing in clinical laboratory practice	23138292	10.1038/nbt.2403	Cited
Li, H et al., Bioinformatics., 2009, The sequence alignment/map format and SAMtools	19505943	10.1093/bioinformatics/btp352	Cited
Li, H, Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997, 2013	—	—	—
Li, H, Bioinformatics., 2011, A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data	21903627	10.1093/bioinformatics/btr509	Cited
Linderman, MD et al., BMC Med Genet, 2014, Analytical validation of whole exome and whole genome sequencing for clinical applications	24758382	10.1186/1755-8794-7-20	Cited
Rehm, HL et al., Genet Med, 2013, ACMG clinical laboratory standards for next-generation sequencing	23887774	10.1038/gim.2013.92	Cited
Steemers, FJ et al., Biotechnol J, 2007, Whole genome genotyping technologies on the BeadArray™ platform	17225249	10.1002/biot.200600213	Cited
Wang, Z et al., Front Genet, 2013, The role and challenges of exome sequencing in studies of human diseases	24032039	10.3389/fgene.2013.00160	Cited
Ye, J et al., BMC Bioinform, 2012, Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction	22708584	10.1186/1471-2105-13-134	Cited

Title	Authors	Journal	Year	Link
Application of multigene panel testing for bleeding, thrombotic, and platelet disorders in patients and the general population in China.	Cai Y et al.	—	2025	→
Genome-wide association analysis of fleece traits in Northwest Xizang white cashmere goat.	Lu X et al.	—	2024	→
Whole-genome sequencing analysis of suicide deaths integrating brain-regulatory eQTLs data to identify risk loci and genes.	Han S et al.	—	2023	→
Comparing BeadChip and WGS Genotyping: Non-Technical Failed Calling Is Attributable to Additional Variation within the Probe Target Sequence.	Gershoni M et al.	—	2022	→
GAWMerge expands GWAS sample size and diversity by combining array-based genotyping and whole-genome sequencing.	Mathur R et al.	—	2022	→
Expanding the pool of public controls for GWAS via a method for combining genotypes from arrays and sequencing	Mathur R et al.	—	2021	—
Frequency of allele variations in the CFTR gene in a Mexican population.	Cantú-Reyna C et al.	—	2021	→
Genomics and Systems Biology at the "Century of Human Population Genetics" conference.	Tatarinova TV et al.	—	2020	→

A comparison of BeadChip and WGS genotyping outputs using partial validation by sanger sequencing.

In this knowledge base

External