Chunk #50 — Methods — Summary Statistics — Quality control of summary statistics

Source: Resource profile and user guide of the Polygenic Index Repository.
Embedded: yes

Text

We applied a uniform set of quality-control filters to each original file with summary statistics (both those from previously unpublished and previously published GWASs). We closely followed the quality-control pipeline detailed in section 1.5.1 of Okbay et al.37 and implemented in the software EasyQC56. Our QC protocol departed from Okbay et al. in the following steps: We used data from the Haplotype Reference Consortium reference panel (r1.1)57 to check for strand misalignment, allele mismatch, chromosome and base pair position concordance, and allele frequency discrepancies (instead of using data from the 1000 Genomes Phase 158). (Mapping file and allele frequency data were downloaded from the EasyQC website, from the following urls, respectively: https://homepages.uni-regensburg.de/~wit59712/easyqc/HRC/HRC.r1-1.GRCh37.wgs.mac5.sites.tab.rsid_map.gz, https://homepages.uni-regensburg.de/~wit59712/easyqc/HRC/HRC.r1-1.GRCh37.wgs.mac5.sites.tab.cptid.maf001.gz.)For simplicity and uniformity, we applied a more conservative imputation accuracy filter of 0.7 to all input files irrespective of the software that was used for imputation.We applied a uniform minor allele frequency filter of 0.01 to all input files. Stricter filters varying by sample size were not necessary because the studies that we analysed were much larger than some of those in Okbay et al.We filtered out