paperKB
coga / coga-kb
Help
Sign in

Chunk #110 — Results — Missing Data and Other Problems

Source
Population structure and eigenanalysis.
Embedded
yes

Text

Missing genotypes by themselves are not the most serious concern. Simply setting M(i,j) = 0 in Equation 3 if marker j is missing for individual i is reasonable if we are testing the null, that there is no structure, and the missing data is “missing at random.” Unfortunately “informative missingness” [37,38] is extremely frequent in genetic data. Probably the most common and serious issue is that with current technology, heterozygotes are more difficult to call than homozygotes. Thus, true heterozygotes are more likely to be called as missing. This is discussed in detail in [38], which is recommended as a very useful discussion of the issues, especially as they apply to medical genetics. If DNA quality (or quantity) varies among our samples, then certain individuals may have an unusual amount of missing data, and then appear as outliers in our eigenanalysis—we in fact have seen this in many runs on real data.