Chunk #2 — Methods — Identifying the sample identity of each single cell

Source: Multiplexed droplet single-cell RNA-sequencing using natural genetic variation.
Embedded: yes

Text

We allow uncertainty of observed genotypes at the v-th variant for the s-th sample using Psv(g)=Pr(g|Datasv), the posterior probability of a possible genotype g given external DNA data Datasv (e.g. sequence reads, imputed genotypes, or array-based genotypes). If genotype likelihood Pr(Datasv|g) is provided (e.g. unphased sequence reads) instead, it can be converted to a posterior probability scale using Psv(g)=Pr(Datasv|g)Pr(g) where Pr(g) ~ Binomial(2, pv) and pv is the population allele frequency of the alternate allele. To allow errors ε in the posterior probability, we replace it with (1−ε)Psv(g)+εPr(g). The overall likelihood that the c-th droplet originated from the s-th sample is (1)Lc(s)=∏v=1V[∑g=02{∏i=1dcv(∑e=01Pr(bcvi|g,e))Psv(g)}]