Publications
A greedy consensus-based approach to distributed job selection: toward fully-decentralized workload management system
Abstract
Current approaches to resilience for highly distributed, heterogeneous, large-scale scientific workflows are limited. Most existing workflow and resource management systems have a single point of failure and resilience strategies are often static, depend on a centralized control, and require considerable design effort from experts. The increasing scale and complexity of workflows coupled with limited resilience capabilities in centralized systems necessitates a fully decentralized, adaptive resource management approach. This paper addresses a very important slice of the overall problem by leveraging the advances in multi-agent systems (MAS). In particular, we explore the suitability of a MAS consisting of globally distributed agents to perform distributed job selection from a dynamic job pool in a truly decentralized, performant, and resilient manner. We present a novel consensus formulation of the distributed job …
- Date
- May 1, 2025
- Authors
- Komal Thareja, Krishnan Raghavan, Anirban Mandal, Pawel Zuk, Imtiaz Mahmud, Mariam Kiran, Ewa Deelman
- Conference
- 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)
- Pages
- 63-72
- Publisher
- IEEE Computer Society