The GSV regions were selected either as stated in the original report [16] or the region common to all GSVs identified in that region [10], and were analysed according to the GSV analysis pipeline illustrated in Figure 1. Intensity data from the French and NFBC1966 cohorts were exported from Illumina BeadStudio in the form of logR ratio (LRR) and B Allele Frequency (BAF); samples with a low SNP call rate (<95%) or a genome-wide LRR variance >0.3 were excluded. The cnvHap algorithm with default parameter settings (false discovery rate ∼5%) was applied to each region under investigation plus additional 500 kbp flanking regions; using these parameters we expect high sensitivity for GSV detection – even a false discovery rate as low as 1% gives genome-wide sensitivity of ∼40% for GSV detection in an individual, and >60% for identifying the presence of a GSV in a cohort [19]. The initial (unsupervised) GSV detection was further improved by a series of manual procedures applied to each GSV locus under study: Only GSV calls covering at least 3 consecutive probes were considered; for