The current findings motivate investigations into the genetic heterogeneity of items in other instruments used to gauge complex psychological traits. Neuroticism is only one of many psychological traits for which composite scores are calculated that are based on the aggregation of item or symptom scores. As a clinical example, consider the DSM-V diagnosis for major depressive disorder (MDD). This diagnosis is based on a list of 9 diverse symptoms (at least four of which are “aggregated” symptoms, reflecting problems on either end of the spectrum, e.g., ‘insomnia or hypersomnia’, ‘increase or decrease in appetite’, see Supplementary Note 2 for the full list of symptoms). To qualify for a depression diagnosis, at least 5 of these symptoms should be endorsed for at least 2 weeks, a procedure that can result in people with very different symptom profiles obtaining the same diagnosis8,27. In subsequently using the diagnostic status as dependent variable in GWAS, the assumption is that these symptoms are genetically similar. The phenotypic heterogeneity of the symptoms does, however, like with neuroticism, raise questions about their alleged genetic homogeneity. As yet, genetic heterogeneity between depression symptoms has only been addressed in the context of twin studies28.