Evaluating deep generative models on cognitive tasks: A case study

Abstract

We present a detailed case study evaluating selective cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect, even though the model seems to have a clear understanding of the objects mentioned in the prompt. Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences, bets, and decision …

Date: June 6, 2023
Authors: Zhisheng Tang, Mayank Kejriwal
Journal: Discover Artificial Intelligence
Volume: 3
Issue: 1
Pages: 21
Publisher: Springer International Publishing

View Paper

Information Sciences Institute

Publications

Evaluating deep generative models on cognitive tasks: A case study

Abstract