Here we observed that as the degree of heterogeneity increased, so did the minimum sample size required to achieve sufficient statistical power. This effect was particularly evident for genotypic relative risk values on the order of 1.2 and 1.5 and for common variants (MAF>0.05). Interestingly, for a complex trait such as BD, most of the reported genetic risk variants have an associated effect size of approximately 1.5 or below [24], [36]–[42]. At least in clinically heterogeneous disorders, it is conceivable that even collecting large sample sizes could only partially compensate for the loss of statistical power in GWAS. In this context it appears crucial to focus on the steps preceding the GWAS. Careful clinical history from all available sources, consensus diagnosis, validity of the phenotypic measures used, evaluation of the inter-rater agreement and reliability and use of prospective design could all help in overcoming the issue of phenotypic heterogeneity.