To assess the false positive rate for DA testing, we randomly sampled without replacement two groups of 100 cells from the CD4+ TCM population and repeated DA testing between the two groups. To assess how the composition of the comparison group affected DA results, we also sampled n = 10 to n = 100 cells from the NK population, mixed with 100 – n cells from the CD4+ TCM population. We computed a set of ground-truth DA peaks by performing DA testing between the whole CD4+ and NK population and classified peaks with an adjusted P value <0.01 and absolute log2 fold change >0.4 as DA. For each sampling, we computed the receiver operator characteristic and area under the curve using the ROCR R package, by lowering the fold-change cutoff for significance64.