Publications

It’s the Data Stupid

Abstract

Artificial Intelligence and Machine Learning have emerged as a promising approach to scientific investigations, but there is a persistent shortage of high-quality, properly annotated datasets suitable for training models. Here, we outline some of the widely reported characteristics for making AI-ready data and compare that with FAIR data. We discuss the limitations of traditional data repositories and the challenges associated with establishing a data repository that can grow with scientific communities and accommodate rapid evolution in research priorities. Finally, we introduce the SCALE principles for repository design that offer a proven framework for creating sustainable, scalable data repositories that can adapt to new data models and methodologies, ensuring that software infrastructure serves research needs rather than constraining them.

Date
September 15, 2025
Authors
Carl Kesselman, Robert Schuler
Conference
2025 IEEE International Conference on eScience (eScience)
Pages
377-378
Publisher
IEEE