Publications

A resource management and fault tolerance services in grid computing

Abstract

In grid computing, resource management and fault tolerance services are important issues. The availability of the selected resources for job execution is a primary factor that determines the computing performance. In this paper, we propose a resource manager for optimal resource selection. Our resource manager automatically selects the set of optimal resources among candidate resources that achieves optimal performance using a genetic algorithm. Typically, the probability of a failure is higher in the grid computing than in a traditional parallel computing and the failure of resources affects job execution fatally. Therefore, a fault tolerance service is essential in computational grids. And grid services are often expected to meet some minimum levels of Quality of Service (QoS) for a desirable operation. To address this issue, we also propose a fault tolerance service that satisfies QoS requirements. We extend the …

Date
November 1, 2005
Authors
HwaMin Lee, KwangSik Chung, SungHo Chin, JongHyuk Lee, DaeWon Lee, Seongbin Park, HeonChang Yu
Journal
Journal of Parallel and Distributed Computing
Volume
65
Issue
11
Pages
1305-1317
Publisher
Academic Press