Publications

Automatic generation of data processing workflows for transportation modeling

Abstract

Scientists, economists, and planners in government, industry and academia spend much of their time accessing, integrating, and analyzing data. However, many of their studies are one-of-a-kind with little sharing and reuse for subsequent endeavors. The Argos project seeks to improve the productivity of analysts by providing a framework that encourages reuse of data sources and data processing operations, and by developing tools to generate data processing workflows. In this paper, we present an approach to automatically generate data processing workflows. First, we define a methodology for assigning formal semantics to data and operations according to a domain ontology, which allows sharing and reuse. Specifically, we define data contents using relational descriptions in an expressive logic. Second, we develop a novel planner that uses relational subsumption to connect the output of a data processing operation with the input of another. Our modeling methodology has the significant advantage that the planner can automatically insert adaptor operations wherever necessary to bridge the inputs and outputs of operations in the workflow. We have implemented the approach in a transportation modeling domain.

Date
May 20, 2007
Authors
José Luis Ambite, Dipsy Kapoor
Journal
ACM International Conference Proceeding Series
Volume
228
Pages
82-91