Although processed concurrently and with the same pipeline, the 484 TD trios from the TIC Genetics and TSAICG cohorts, as well as the 602 SSC control trios, were sequenced at different times using different capture platforms, sequencing machines, and genomic core facilities (Figure 1). Therefore, we performed principal components analysis (PCA) to check for potential batch effects (Figure S4). We collected sequencing quality metrics using the Picard tools CollectHsMetrics, CollectAlignmentSummaryMetrics, and Collect-VariantCallingMetrics. We also estimated the number of callable base pairs within each trio as the number of base pairs at ≥ 20× coverage in all family members (we refer to this as joint coverage at 20×). These metrics, as well as paternal and maternal age, where available, informed the PCA (Table S1). The PCA revealed clear batch effects based on sequencing facility, particularly with respect to the TSAICG UCLA and Broad subsets, and within the SSC control trios (Figure S4). We focused on the first four principal components (PCs), which explain 61.6% of the variance in the quality metrics. We considered samples greater than three standard deviations (SD) from