Publications
Selecting clustering algorithms for identity-by-descent mapping
Abstract
Groups of distantly related individuals who share a short segment of their genome identical-by-descent (IBD) can provide insights about rare traits and diseases in massive biobanks using IBD mapping. Clustering algorithms play an important role in finding these groups accurately and at scale. We set out to analyze the fitness of commonly used, fast and scalable clustering algorithms for IBD mapping applications. We designed a realistic benchmark for local IBD graphs and utilized it to compare the statistical power of clustering algorithms via simulating 2.3 million clusters across 850 experiments. We found Infomap and Markov Clustering (MCL) community detection methods to have high statistical power in most of the scenarios. They yield a 30% increase in power compared to the current state-of-art approach, with a 3 orders of magnitude lower runtime. We also found that standard clustering metrics, such as …
- Date
- August 7, 2025
- Authors
- Ruhollah Shemirani, Gillian M Belbin, Keith Burghardt, Kristina Lerman, Christy L Avery, Eimear E Kenny, Christopher R Gignoux, José Luis Ambite
- Book
- PACIFIC SYMPOSIUM ON BIOCOMPUTING 2023: Kohala Coast, Hawaii, USA, 3–7 January 2023
- Pages
- 121-132