Rutgers University Astrophysics Colloquium
Artificial Intelligence and Pattern Recognition: A Physics Viewpoint
The interplay between memorization and generalization in pattern recognition lies at the core of modern artificial intelligence. This tension becomes particularly pronounced as we train models with billions or even trillions of parameters—models with sufficient capacity to memorize virtually all information in their training data. In this talk, I will outline a physics-inspired perspective on this problem, focusing on how energy landscapes can illuminate the transition, or crossover, between memorization and generalization. As a concrete example, I will examine an attention-based task for a one-layer transformer and its connection to modern Hopfield models. Notably, transformers constitute the “T’’ in GPT: Generative Pre-trained Transformer. I will discuss how the Hopfield model itself must undergo a crossover for generalization to become possible. If time permits, I will also sketch how this task relates to diffusion models, which exhibit their own forms of memorization–generalization behavior.