The $173 million NIH-initiated Human Microbiome Project (HMP) set out to characterize the human microbiome at a population scale and to define standard reference datasets to be used for human microbiome research [35]. The resulting 16S rRNA datasets are composed of samples from 242 individuals, all of whom were medical students in the USA and were certified healthy by medical professionals. Thousands of samples were collected from these individuals at one to three time points, covering 15 to 18 sampling sites depending on the sex of the individual. These samples were evaluated using two different regions of the 16S gene (leading to two distinct datasets—V1-3 and V3-5) [31] and were processed at four different sequencing centers. Phenotypic information about the individuals was collected, but while the sequence data associated with the samples are publically available, access to any de-identified information about the individuals requires rigorous approval mechanisms.