Chunk #0 — Methods — Identifying the sample identity of each single cell

Source: Multiplexed droplet single-cell RNA-sequencing using natural genetic variation.
Embedded: yes

Text

We first describe the method to infer the sample identity of each cell in the absence of doublets. Consider RNA-sequence reads from C barcoded droplets multiplexed across S different samples, where their genotypes are available across V exonic variants. Let dcv be the number of unique reads overlapping with the v-th variant from the c-th droplet. Let bcvi ∈ {R, A, O}, i ∈ {1, ⋯ , dcv} be the variant-overlapping base call from the i-th read, representing reference (R), alternate (A), and other (O) alleles respectively. Let ecvi ∈ {0,1} be a latent variable indicating whether the base call is correct (0) or not (1), then given ecvi = 0, bcvi ∈ {R = 0, A = 1} and ~ Binomial (2,g2) when g ∈ {0,1,2} is the true genotype of sample corresponding to c-th droplet at v-th variant. When ecvi = 1, we assume that Pr(bcvi|g, ecvi) follows Supplementary Table 4. ecvi is assumed to follow Bernoulli (10−qcvi10) where qcvi is a phred-scale quality score of the observed base call. We use the standard 10X pipeline to process