paperKB
coga / coga-kb
Help
Sign in

Chunk #10 — Materials and Methods — Analyses

Source
Identification of Functional Genetic Variants Associated With Alcohol Dependence and Related Phenotypes Using a High-Throughput Assay.
Embedded
yes

Text

The basic idea of the analysis is to compare the relative counts (unique UMI) of the RNA expressed from each allele of a given SNP to that of the input plasmid DNA of that SNP extracted from the same pool of cells (to control for potential bias in the plasmid pool). A generalized linear mixed effect model (GLMM) was used to model the sequencing reads based on allele type (reference or alternative), DNA source (DNA or RNA), batch number, and the interaction between allele type and cDNA or DNA. A random variable was used to account for data derived from the same biological sample, and a negative binomial distribution was applied. Thus, we modeled this as: log(μ)=β0+β1X1+β2X2+β12X1X2+βBXB+rXS where μ is the expected number of sequencing reads, X1is the allele type (reference or alternative), X2is the DNA source (DNA or cDNA), XB is the batch number (first or second sequencing), and Xs is the sample replicate number. In this model, β0, β1, β2, and β12 are the coefficients of the fixed effects, βB is the coefficient for the batch effect, and r is the coefficient for the random effect. For each SNP, we estimate the value of the coefficients.