Researchers at The Ohio State University have developed a new way to pinpoint the exact moments in an educational video when children are learning key concepts.
By combining artificial intelligence with eye-tracking data, the team could also predict how well children understood the material based on their eye movements during viewing. The findings, published August 5 in the Journal of Communication, suggest that this approach could eventually allow videos to adapt in real time to a learner’s needs, potentially transforming digital education.
Tracking Attention in Real Time
The study involved 197 children aged 4 to 8 who watched a four-minute composite video drawn from the popular YouTube series SciShow Kids and Learn Bright. The lesson focused on how animals use camouflage. Before watching, each child answered questions to measure baseline knowledge. Afterward, they were tested again to see what they had learned.
Eye-tracking technology recorded where children looked as the video played. These moment-by-moment attention maps were then analyzed with machine learning algorithms. According to lead author Jason Coronel, associate professor of communication at Ohio State, this was critical for linking visual engagement to comprehension.
Seven Key Learning Moments
The AI analysis revealed seven points in the video where shifts in children’s eye movements were strongly tied to how well they grasped the concept of camouflage. One example came early, when the host asked viewers to help find her animated sidekick, Squeaks. This cue appeared to spark engagement that carried through to later, more complex explanations.
- Early interactive prompt to find Squeaks
- Transition into defining camouflage with on-screen text
- Visual demonstrations of animals blending with their environment
- Change from general examples to a specific case study
- Animated sequence showing predator-prey interactions
- Review of main points with illustrated recap
- Final call-to-action encouraging viewers to apply knowledge
Co-author Alex Bonus noted that these moments often aligned with “event boundaries”—transitions when one meaningful segment ends and another begins. In educational media, such boundaries may help learners organize and retain information.
From Research to Personalized Learning
While the findings are preliminary, the potential applications are far-reaching. Coronel envisions a future where eye tracking could instantly detect when a viewer is struggling to understand a concept, and AI could alter the video content on the fly—perhaps by offering a different example or slowing the pace.
“Our ultimate goal is to build an AI system that can tell in real time whether a viewer is understanding or not understanding what they are seeing in an educational video,” Coronel said. “That would give us the opportunity to dynamically adjust the content for an individual person to help them understand what is being taught.”
As eye-tracking technology becomes more affordable and integrated into everyday devices, such adaptive systems could make personalized video instruction more effective and scalable. For teachers, that could mean closing the gap between when a misunderstanding happens and when it is addressed—shifting from weeks to seconds.
Study Details
Journal: Journal of Communication
DOI: 10.1093/joc/jqaf036
Authors: Jason C. Coronel, Matthew Sweitzer, James Alex Bonus, Rebecca Dore, Blue Lerner
ScienceBlog.com has no paywalls, no sponsored content, and no agenda beyond getting the science right. Every story here is written to inform, not to impress an advertiser or push a point of view.
Good science journalism takes time — reading the papers, checking the claims, finding researchers who can put findings in context. We do that work because we think it matters.
If you find this site useful, consider supporting it with a donation. Even a few dollars a month helps keep the coverage independent and free for everyone.