Publications

Evaluating deep generative models on cognitive tasks: A case study

Abstract

We present a detailed case study evaluating selective cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect, even though the model seems to have a clear understanding of the objects mentioned in the prompt. Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences, bets, and decision …

Date
June 6, 2023
Authors
Zhisheng Tang, Mayank Kejriwal
Journal
Discover Artificial Intelligence
Volume
3
Issue
1
Pages
21
Publisher
Springer International Publishing