Publications
Integrity protection for scientific workflow data: Motivation and initial experiences
Abstract
With the continued rise of scientific computing and the enormous increases in the size of data being processed, scientists must consider whether the processes for transmitting and storing data sufficiently assure the integrity of the scientific data. When integrity is not preserved, computations can fail and result in increased computational cost due to reruns, or worse, results can be corrupted in a manner not apparent to the scientist and produce invalid science results. Technologies such as TCP checksums, encrypted transfers, checksum validation, RAID and erasure coding provide integrity assurances at different levels, but they may not scale to large data sizes and may not cover a workflow from end-to-end, leaving gaps in which data corruption can occur undetected. In this paper we explore an approach of assuring data integrity - considering either malicious or accidental corruption - for workflow executions …
- Date
- July 28, 2019
- Authors
- Mats Rynge, Karan Vahi, Ewa Deelman, Anirban Mandal, Ilya Baldin, Omkar Bhide, Randy Heiland, Von Welch, Raquel Hill, William L Poehlman, F Alex Feltus
- Book
- Practice and Experience in Advanced Research Computing 2019: Rise of the Machines (learning)
- Pages
- 1-8