Publications
It’s the Data Stupid
Abstract
Artificial Intelligence and Machine Learning have emerged as a promising approach to scientific investigations, but there is a persistent shortage of high-quality, properly annotated datasets suitable for training models. Here, we outline some of the widely reported characteristics for making AI-ready data and compare that with FAIR data. We discuss the limitations of traditional data repositories and the challenges associated with establishing a data repository that can grow with scientific communities and accommodate rapid evolution in research priorities. Finally, we introduce the SCALE principles for repository design that offer a proven framework for creating sustainable, scalable data repositories that can adapt to new data models and methodologies, ensuring that software infrastructure serves research needs rather than constraining them.
- Date
- September 15, 2025
- Authors
- Carl Kesselman, Robert Schuler
- Conference
- 2025 IEEE International Conference on eScience (eScience)
- Pages
- 377-378
- Publisher
- IEEE