A Framework for Adaptive Fault-Tolerant Execution of Workflows in the Grid: Empirical and Theoretical Analysis

Genaina Nunes Rodrigues; GUIMARAES, FELIPE PONTES ; CÉLESTIN, PEDRO ; BATISTA, DANIEL MACEDO ; DE MELO, ALBA CRISTINA MAGALHAES ALVES
Journal of Grid Computing, v. 12, p. 127-151
View details:  

Abstract

In this paper, we propose and evaluate a framework for fault tolerant workflow execution in Grid environments. Different from previous work in the literature, our system dynamically chooses an appropriate fault tolerance technique while using a user-defined rule-based system. We also provide a generic interface that can be used to add fault tolerance techniques to the framework. The results obtained with real workflows in an experimental Grid environment show that the overhead introduced by our framework in a failure-free execution is, in the worst evaluated case, approximately 10 %. Moreover, we show that, using our framework, workflows are able to execute successfully in the presence of failures and that the framework can dynamically choose an appropriate fault tolerance technique. The main contributions of our work are twofold: the developed framework and the model-based dependability analysis we performed on it. The purpose in carrying out a model-based dependability analysis consists on evaluating the interaction between our framework and the distributed Grid environment beyond the physical limitations of an empirical evaluation. By doing this, we provide means to plan the assurance of QoS in the Grid resource allocation, while applying the fault-tolerance mechanisms we implement in our framework regardless of the underlying middleware.

BibTex references

@Article{Guimaraes2014,
author="Guimaraes, Felipe Pontes
and C{\'e}lestin, Pedro
and Batista, Daniel Macedo
and Rodrigues, Gena{\'i}na Nunes
and de Melo, Alba Cristina Magalhaes Alves",
title="A Framework for Adaptive Fault-Tolerant Execution of Workflows in the Grid: Empirical and Theoretical Analysis",
journal="Journal of Grid Computing",
year="2014",
volume="12",
number="1",
pages="127--151",
abstract="In this paper, we propose and evaluate a framework for fault tolerant workflow execution in Grid environments. Different from previous work in the literature, our system dynamically chooses an appropriate fault tolerance technique while using a user-defined rule-based system. We also provide a generic interface that can be used to add fault tolerance techniques to the framework. The results obtained with real workflows in an experimental Grid environment show that the overhead introduced by our framework in a failure-free execution is, in the worst evaluated case, approximately 10 {\%}. Moreover, we show that, using our framework, workflows are able to execute successfully in the presence of failures and that the framework can dynamically choose an appropriate fault tolerance technique. The main contributions of our work are twofold: the developed framework and the model-based dependability analysis we performed on it. The purpose in carrying out a model-based dependability analysis consists on evaluating the interaction between our framework and the distributed Grid environment beyond the physical limitations of an empirical evaluation. By doing this, we provide means to plan the assurance of QoS in the Grid resource allocation, while applying the fault-tolerance mechanisms we implement in our framework regardless of the underlying middleware.",
issn="1572-9184",
doi="10.1007/s10723-013-9281-4",
url="http://dx.doi.org/10.1007/s10723-013-9281-4"
}

Other publications in the database