Publications

Time-bound analytic tasks on large datasets through dynamic configuration of workflows

Abstract

Domain experts are often untrained in big data technologies and this limits their ability to exploit the data they have available. Workflow systems hide the complexities of high-end computing and software engineering by offering pre-packaged analytic steps combined into multi-step methods commonly used by experts. A current limitation of workflow systems is that they do not take into account user deadlines: they run workflows selected by the user, but take their time to do so. This is impractical when large datasets are at stake, since users often prefer to see an answer faster even if it has lower precision or quality. In this paper, we present an extension to workflow systems that enables them to take into account user deadlines by automatically generating alternative workflow candidates and ranking them according to performance estimates. The system makes these estimates based on workflow performance models …

Date
November 17, 2013
Authors
Yolanda Gil, Varun Ratnakar, Rishi Verma, Andrew Hart, Paul Ramirez, Chris Mattmann, Arni Sumarlidason, Samuel L Park
Book
Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Pages
88-97