We demonstrate the utility of a multivariate approach to assigning samples to more homogeneous genetic groups by using minimum Mahalanobis distance based on 1KGP reference populations. By doing so, we were able to balance maximal genetic similarity within group assignment with minimal sample loss due to unknown and self-reported mixed ancestry. We were able to include 9% of the sample that may have been excluded otherwise. As more cohorts are combined through large scale collaborations, maximizing sample retention, particularly for understudied ancestry groups, while reducing population stratification remains an important endeavour.