Chunk #70 — Experimental Procedures — Statistical analysis of de novo recurrence

Source: Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism.
Embedded: yes

Text

To determine the probability of finding multiple rare de novo CNVs at the same location in probands, we first estimated how many likely positions in the genome were contributing to the observed de novo CNVs in siblings. As there are widely varying mutation rates for structural variation across the genome (Fu et al., 2010), some positions are more likely to result in de novo CNVs observed in our sample than others. Consequently, the likely number of positions is much smaller than the total possible number of positions. We refer to the likely CNV regions as eCNVRs (effective copy number variable regions) and calculate their quantity “C” using the so-called “unseen species problem” which uses the frequency and number of observed CNV types (or species) to infer how many species are present in the population. Based on the observed de novo CNVs in the control sibling group, we apply the formula (Bunge and Fitzpatrick, 1993) C = c/u + g2*d*(1-u)/u, in which c = the total number of distinct species observed; c1= the number of singleton species; d = total number