Chunk #59 — STAR* METHODS — QUANTIFICATION AND STATISTICAL ANALYSIS — Examination of the Impact of Sample Size Imbalance on Genetic Correlations and Genomic SEM Results
Because SCZ and MD accounted for the majority of the total sample size in our study as well as the two most statistically powerful studies (estimated by calculating their effective sample size and multiplying that by heritability), we generated simulated datasets similar in size and heritability, as well as cross-correlation to the other datasets, for each of the six smaller studies (BIP, ADHD, ASD, TS, ANO, and OCD); In brief, simulated genetic data was created from the post-QC UKBB imputed data for each of the six disorders by randomly selecting subjects without any overlap given their original sample sizes. In each simulation replicate, we then simulated quantitative phenotypes (Y = ) given true effect sizes, the standardized genotype matrix X, and a non-genetic error term. The true effect sizes of each SNP were drawn from a multivariate normal distribution, where M is the total number of SNPs in the genome, μ is a zero vector of length 6, and ∑ is the covariance matrix that accounts for the genetic correlations (rg) among the six disorders (with disease-specific SNP-heritabilities on the