Chunk #42 — STAR★METHODS — METHOD DETAILS — Whole Exome Sequencing — Variant Detection

Source: De Novo Coding Variants Are Strongly Associated with Tourette Disorder.
Embedded: yes

Text

We utilized GATK best practices (https://software.broadinstitute.org/gatk/best-practices/) for pre-processing and variant discovery (DePristo et al., 2011; McKenna et al., 2010; Van der Auwera et al., 2013). We processed each cohort separately, including the TSAICG – Broad, and TSAICG – UCLA sub-cohorts. BWA-mem (Li and Durbin, 2009) aligned raw reads to the 1000 Genomes GRCh37 hg19 genome build, Picard Tools (https://broadinstitute.github.io/picard/) marked duplicates, and GATK conducted base quality score recalibration. We conducted variant calling per sample with HaplotypeCaller in GVCF mode. Subsequent joint genotyping conducted across each cohort produced a multi-sample VCF callset for each cohort. Where appropriate, we utilized a list of capture targets corresponding to each cohort’s respective library capture kit, with an interval padding of 100. We applied variant quality score recalibration (VQSR) to each VCF to refine the callset. We utilized passing variants only in downstream analyses. Example commands for variant calling are located in the project bitbucket repository at https://bitbucket.org/willseylab/tourette_phase1. Annovar (Wang et al., 2010) annotated variants according to RefSeq hg19 gene definitions.