Publications : Information Sciences Institute

Deriva-ML: A Continuous FAIRness Approach to Reproducible Machine Learning Models

Abstract

Increasingly, artificial intelligence (AI) and machine learning (ML) are used in eScience applications [9]. While these approaches have great potential, the literature has shown that ML-based approaches frequently suffer from results that are either incorrect or unreproducible due to mismanagement or misuse of data used for training and validating the models [12], [15]. Recognition of the necessity of high-quality data for correct ML results has led to data-centric ML approaches that shift the central focus from model development to creation of high-quality data sets to train and validate the models [14], [20]. However, there are limited tools and methods available for data-centric approaches to explore and evaluate ML solutions for eScience problems which often require collaborative multidisciplinary teams working with models and data that will rapidly evolve as an investigation unfolds [1]. In this paper, we show how …

Date: September 16, 2024
Authors: Zhiwei Li, Carl Kesselman, Mike D’Arcy, Michael Pazzani, Benjamin Yizing Xu
Conference: 2024 IEEE 20th International Conference on e-Science (e-Science)
Pages: 1-10
Publisher: IEEE

View Paper