Publications

A resiliency model for high performance infrastructure based on logical encapsulation

Abstract

An emerging trend in distributed systems is the creation of dynamically provisioned heterogeneous high performance platforms that include the co-allocation of both virtualized computing and network attached storage volumes offering NAS and SAN level data services. These high performance computing environments support parallel applications performing traditional file system operations. As with any parallel platform the ability to continue computation in the face of component failures is an important characteristic. Achieving resiliency in heterogeneous environments presents unique challenges and opportunities not found in homogeneous aggregations of computing resources. We present a logical encapsulation model for heterogeneous high performance infrastructure, which enables a reactive resiliency approach for federations of virtual machines and externally hosted physical storage volumes …

Date
June 18, 2012
Authors
James J Moore, Carl Kesselman
Book
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Pages
283-294