Chunk #13 — Materials and methods — Samples — Data preparation

Source: Genome- and transcriptome-wide splicing associations with alcohol use disorder.
Embedded: yes

Text

DNA genotypes from human RNA-seq data were ascertained via the SAMtools mpileup function as done previously21. Human genotypes derived from RNA-seq data were phased and imputed with Beagle version 5.1, which uses a probabilistic Hidden Markov Chain model that performs well for sequencing data with sparse genomic coverage22. We would like to caution the reader that Beagle was originally developed for genome-wide DNA variant data and not RNA-sequencing data. Our analyses used a few methods and criteria for quality control (QC) including: genotyping rate > 95%, minor allele frequency > 0.10, Hardy–Weinberg equilibrium > 1e-6, > 5 reads per sample, Phred Score > 20 and an imputation score > 0.3. The input for imputation was 40,878 called genotypes that were common among all samples and passed initial QC. These variants were imputed to 1000 Genomes Phase III all data, which resulted in 570,755 SNPs, 178,598 of which passed QC. These ~ 170 k variants were used for polygenic score and sQTL analyses. Note, that the 91.9% of these SNPs were present in the AUD GWAS, but that GWAS has 77.9