Chunk #71 — Methods — Unsupervised learning to determine TUD clustering.

Source: Multi-ancestry meta-analysis of tobacco use disorder identifies 461 potential risk genes and reveals associations with multiple health outcomes.
Embedded: yes

Text

Previous studies have shown that consumption and misuse/dependence phenotypes have a distinct genetic architecture. To explore whether the TUD meta-analysis clustered more with consumption or misuse/dependence phenotypes, we used a data-driven unsupervised machine learning method known as agglomerative hierarchical clustering analysis (HCA).106 HCA forms clusters iteratively by creating groups and successively joining or splitting those groups based on a prespecified algorithm.106 Agglomerative nesting (AGNES) is a bottom-up process focused on individual traits to structure. Agglomerative clustering was chosen as this allowed us to compare different algorithms to maximize for the dissimilarity on each branch, with Ward’s minimum variance method performing best. All models were fit in R using the cluster package (version 2.1.4).106