Chunk #61 — Methods — Singleton clustering analysis — Mixture model parameter estimation

Source: Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.
Embedded: yes

Text

We estimate the parameters of this mixture (λi,1, …, λi,K, θi,1, …, θi,K) using the expectation–maximization algorithm as implemented in the mixtools R package100. Code for this analysis is available for download from the GitHub repository101. To identify an optimal number of mixture components, we iteratively fit mixture models for increasing values of K and calculated the log-likelihood of the observed data D given the parameter estimates \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\hat{\lambda }}_{i,1},\mathrm{...},{\hat{\lambda }}_{i,K},{\hat{\theta }}_{i,1},\mathrm{...},{\hat{\theta }}_{i,K})$$\end{document}(λˆi,1,...,λˆi,K,θˆi,1,...,θˆi,K), stopping at K components if the P value of the likelihood ratio test between K − 1 and K components was >0.01 (χ2 test with two degrees of freedom). The goodness-of-fit plateaued at four components for the majority of individuals, so we used the four-component parameter estimates from each individual in all subsequent analyses.