Although the first two steps in the framework focus on accurate typing of known, common polymorphisms, it is also possible using the same platforms to identify rare and de novo copy number variants for which there is no prior knowledge. Such problems of ab initio discovery are fundamentally more difficult because of the need to distinguish a relatively small number of real CNVs at unknown sites from the statistical fluctuations that arise in any genome-scale dataset. The heterogeneity of probe performances on array platforms further complicates this problem: different probes show different intrinsic measurement variance across samples (a fact seldom modeled by CNV discovery algorithms); furthermore, different SNP probe sets show different quantitative responses to having 0, 1 or 2 copies of each allele (Supplementary Fig. 1). We therefore sought to model the empirical properties of each probe in order to maximize the power to detect rare CNVs. As in most other algorithms10, we search for consistent evidence for copy number variation across multiple neighboring probes to reduce the effect of normal statistical fluctuations.