A metadata catalog service for data intensive applications

Abstract

Data intensive scientific applications promise dramatic progress in scientific discovery. These applications produce and analyze terabyte and petabyte data sets that may span millions of files or data objects. Driven by trends in storage technology, computational power and network bandwidth, these data intensive applications are increasing in scope and sophistication to perform scientific data collection and analysis on a scale never before achievable.
Metadata services are required to support these data intensive applications. Metadata is information that describes the contents of data items. Metadata services allow scientists to record information about the creation, transformation, meaning and quality of data items and to query for data items based on these descriptive attributes. Accurate identification of desired data items is essential for correct analysis of experimental and simulation results. In the past, scientists have largely relied on ad hoc methods (descriptive file and directory names, lab notebooks, etc.) to record information about data items. However, these methods do not scale to terabyte and petabyte data sets consisting of millions of data items. Extensible, reliable, high performance grid services are required to support registration and query of metadata information.

Date: January 1, 1970
Authors: Ann Chervenak, Ewa Deelman, Carl Kesselman, Laura Pearlman, Gurmeet Singh
Journal: GriPhyN technical report, 2002–11

View Paper

Information Sciences Institute

Publications

A metadata catalog service for data intensive applications

Abstract