The first step in our methodology, Canary (copy number analysis routine), determines the copy number of each individual at each predefined CNP locus. A high-resolution map of common CNPs is needed to define these loci; we used the map of McCarroll et al.6, but improved maps can be substituted as they emerge. Although an individual probe inside a given CNP may not provide enough information to give an accurate integer measurement of copy number (a copy number ‘genotype’)11 (Fig. 2a), multiple probes that interrogate the same CNP segment typically show highly correlated and reproducible patterns of intensity6 (Fig. 2b,c). The measurements for the probes in the same CNP (or, in some cases, for a predefined high-performing subset of those probes; Fig. 2c) are combined into a single summarized intensity measurement, resulting in one summarized measurement per sample (Fig. 2d). The summarized measurements for a batch of samples are then clustered into discrete copy number classes using a one-dimensional Gaussian mixture model (GMM), where the expected location of copy number clusters is informed by the results of previous experiments (Fig. 2d-f).