Publications
Evaluating deep generative models on cognitive tasks: A case study
Abstract
We present a detailed case study evaluating selective cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect, even though the model seems to have a clear understanding of the objects mentioned in the prompt. Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences, bets, and decision …
- Date
- June 6, 2023
- Authors
- Zhisheng Tang, Mayank Kejriwal
- Journal
- Discover Artificial Intelligence
- Volume
- 3
- Issue
- 1
- Pages
- 21
- Publisher
- Springer International Publishing