As a second benefit, we found that community-scale integration enabled consistent identification of ultra-rare populations, and in particular, a population of Foxi1-expressing ‘pulmonary ionocytes’ that were recently discovered in both human and mouse lungs61 (Fig. 4d). While these cells were only independently annotated in 6 out of 19 studies, our integrated analysis discovered at least one pulmonary ionocyte in 17 out of 19. The identified ionocytes were extremely rare (0.047%), but exhibited clear expression of canonical markers (Fig. 4b), highlighting the potential value for pooling multiple datasets to characterize these cells. We note that selection of dictionary atoms by sketching, or leverage-score sampling is essential for optimal performance (Supplementary Fig. 5h,i); repeating the analysis using a set of atoms determined by random downsampling successfully integrated abundant cell types, but failed to integrate ionocytes as they were not sufficiently represented in the dictionary.