paperKB
coga / coga-kb
Help
Sign in

Chunk #61 — Methods — Singleton clustering analysis — Mixture model parameter estimation

Source
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.
Embedded
yes

Text

We estimate the parameters of this mixture (λi,1, …, λi,K, θi,1, …, θi,K) using the expectation–maximization algorithm as implemented in the mixtools R package100. Code for this analysis is available for download from the GitHub repository101. To identify an optimal number of mixture components, we iteratively fit mixture models for increasing values of K and calculated the log-likelihood of the observed data D given the parameter estimates \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\hat{\lambda }}_{i,1},\mathrm{...},{\hat{\lambda }}_{i,K},{\hat{\theta }}_{i,1},\mathrm{...},{\hat{\theta }}_{i,K})$$\end{document}(λˆi,1,...,λˆi,K,θˆi,1,...,θˆi,K), stopping at K components if the P value of the likelihood ratio test between K − 1 and K components was >0.01 (χ2 test with two degrees of freedom). The goodness-of-fit plateaued at four components for the majority of individuals, so we used the four-component parameter estimates from each individual in all subsequent analyses.