About

Sonia Joseph
AI researcher building interpretable video world models
I work on multimodal interpretability and physical reasoning in video world models.
At Meta, I lead the internal interpretability community—a 150+ member group of researchers and engineers—and work on physically grounded video world models on the JEPA team. I am also completing my PhD at McGill University (Mila).
My research examines how video systems encode physical structure and causality, where those representations fail, and how to make such failures visible inside the model rather than relying on surface-level metrics. This work is motivated by the practical demands of deploying multimodal systems that must reason over physical dynamics, uncertainty, and long horizons.
Previously, I was the CTO at an early-stage startup, where I built and led the engineering team, drove recruiting, and led fundraising and technical strategy through our first institutional round. That experience informs how I think about building durable technology companies, particularly around alignment between research, infrastructure, and incentives.
I studied neuroscience and computer science at Princeton, with research at the Princeton Neuroscience Institute and Janelia Research Campus. Alongside this work, I create long-form video content that makes interpretability and world models legible to wider audiences, and my commentary on AI has appeared in The New York Times, TIME, and Bloomberg.
Publications and CV
See my official academic website at soniajoseph.github.io for a full list of papers and projects.
Elsewhere
Academic Website
Google Scholar
Twitter / X
LinkedIn
YouTube
Instagram