Chunk #51 — Methods — Summary Statistics — Quality control of summary statistics

Source: Resource profile and user guide of the Polygenic Index Repository.
Embedded: yes

Text

a uniform minor allele frequency filter of 0.01 to all input files. Stricter filters varying by sample size were not necessary because the studies that we analysed were much larger than some of those in Okbay et al.We filtered out standard-error outliers. To do so, we first estimated the standard deviation (σ^y) of the phenotype in each input file by regressing the reported standard errors on the following approximation to the standard error of a coefficient estimated by OLS when the phenotype is standardized: SEpred, j=1N×12×MAFj×(1−MAFj), where MAFj is the minor allele frequency of SNP j and N is the GWAS sample size. We filtered out markers with SEpred,jSEj<σ^y2 or SEpred,jSEj>2σ^y. This filter allowed us to identify and remove markers for which the reported GWAS sample size deviated considerably from the sample size implied by the marker’s standard error. This filter was particularly relevant for publicly available summary statistics, where marker-specific sample sizes were typically not reported. (Having an accurate number for the sample size is important for LDpred30.) Before each filtered file was cleared for subsequent meta-analyses, we also prepared and visually inspected a number of diagnostic plots, as described in Okbay et al. Our final analyses are limited