Seminars and Events
What Transformers Can and Can't Do: A Logical Approach
Event Details
Location: CR#689 Conference Room ISI-MDR
Speaker: David Chiang, University of Notre Dame
REMINDER:
Meeting hosts only admit on-line guests that they know to the Zoom meeting. Hence, you’re highly encouraged to use your USC account to sign into Zoom.
If you’re an outside visitor, please inform us at (nlg-seminar-host(at)isi.edu) to make us aware of your attendance so we can admit you. Specify if you will attend remotely or in person at least one business day prior to the event. Provide your: full name, job title and professional affiliation and arrive at least 10 minutes before the seminar begins.
If you do not have access to the 6th Floor for in-person attendance, please check in at the 10th floor main reception desk to register as a visitor and someone will escort you to the conference room location.
Join Zoom Meeting
https://usc.zoom.us/j/92197441779?pwd=tgNbzWG3ZXQSl4AIl1AhQW5XkAz0FC.1
Meeting ID: 921 9744 1779
Passcode: 804448
Neural networks are advancing the state of the art in many areas of artificial intelligence, but in many respects remain poorly understood. At a time when new abilities as well as new limitations of neural networks are continually coming to light, a clear understanding of what they can and cannot do is more needed than ever. The theoretical study of transformers, the dominant neural network for sequences, is just beginning, and my collaborators and I have helped to make this into a fruitful and fast-growing area of research.
Our particular approach is to explore these questions by relating neural networks to formal logic. It is well known that finite automata are equivalent to regular expressions, and both are equivalent to a logic called monadic second-order logic; we want to establish similar connections between neural networks and logics. We have successfully proven that one variant of transformers, unique-hard attention transformers, are exactly equivalent to the first-order logic of strings with ordering, which allows numerous expressivity results from logic to be carried over to unique-hard attention transformers.
We have also investigated transformers with softmax attention and a constant number of bits of precision (except inside attention) and successfully proven that they are exactly equivalent to a certain temporal logic with counting operators. A consequence is that we can show that deeper transformers are strictly more expressive than shallower transformers, and this result accurately predicts how transformers behave in practice.