Ad
AI systems still struggle to understand dynamic social interactions, falling far behind human abilities due to limitations in how these models process complex, real-world scenarios.
Johns Hopkins study reveals AI models struggle to accurately predict social interactions.
A recent study led by researchers at Johns Hopkins University reveals that humans outperform current AI models in accurately describing and interpreting social interactions within dynamic scenes. This capability is critical for technologies such as autonomous vehicles and assistive robots, which rely heavily on AI to safely navigate real-world environments.
The research highlights that existing AI systems struggle to grasp the nuanced social dynamics and contextual cues essential for effectively interacting with people. Furthermore, the findings suggest that this limitation may stem fundamentally from the underlying architecture and infrastructure of current AI models.
“AI for a self-driving car, for example, would need to recognize the intentions, goals, and actions of human drivers and pedestrians. You would want it to know which way a pedestrian is about to start walking, or whether two people are in conversation versus about to cross the street,” said lead author Leyla Isik, an assistant professor of cognitive science at Johns Hopkins University. “Any time you want an AI to interact with humans, you want it to be able to recognize what people are doing. I think this sheds light on the fact that these systems can’t right now.”
Kathy Garcia, a doctoral student working in Isik’s lab at the time of the research and co–first author, recently presented the research findings at the International Conference on Learning Representations on April 24.
Comparing AI and Human Perception
To determine how AI models measure up compared to human perception, the researchers asked human participants to watch three-second video clips and rate features important for understanding social interactions on a scale of one to five. The clips included people either interacting with one another, performing side-by-side activities, or conducting independent activities on their own.
The researchers then asked more than 350 AI language, video, and image models to predict how humans would judge the videos and how their brains would respond to watching. For large language models, the researchers had the AIs evaluate short, human-written captions.
Participants, for the most part, agreed with each other on all the questions; the AI models, regardless of size or the data they were trained on, did not. Video models were unable to accurately describe what people were doing in the videos. Even image models that were given a series of still frames to analyze could not reliably predict whether people were communicating. Language models were better at predicting human behavior, while video models were better at predicting neural activity in the brain.
A Gap in AI Development
The results provide a sharp contrast to AI’s success in reading still images, the researchers said.
“It’s not enough to just see an image and recognize objects and faces. That was the first step, which took us a long way in AI. But real life isn’t static. We need AI to understand the story that is unfolding in a scene. Understanding the relationships, context, and dynamics of social interactions is the next step, and this research suggests there might be a blind spot in AI model development,” Garcia said.
Researchers believe this is because AI neural networks were inspired by the infrastructure of the part of the brain that processes static images, which is different from the area of the brain that processes dynamic social scenes.
“There’s a lot of nuances, but the big takeaway is none of the AI models can match human brain and behavior responses to scenes across the board, like they do for static scenes,” Isik said. “I think there’s something fundamental about the way humans are processing scenes that these models are missing.”
Meeting: International Conference on Learning Representations
Funding: U.S. National Science Foundation, U.S. National Science Foundation, NIH/National Institute of Mental Health
Never miss a breakthrough: Join the SciTechDaily newsletter.
Ad
SomaDerm, SomaDerm CBD, SomaDerm AWE (by New U Life).