Latent Action Pretraining Through World Modeling
Published in Submitted to ICRA 2026, 2025
Vision-Language-Action (VLA) models enable robots to follow language instructions but often require large labeled datasets. We propose LAWM, a model-agnostic framework that learns latent actions through world modeling from unlabeled videos.
Supervised by Prof. Ian D. Reid at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI).
Recommended citation: Bahey Tharwat, Yara Nasser, Ali Abouzied, Ian Reid. (2025). "Latent Action Pretraining Through World Modeling." arXiv preprint arXiv:2509.18428.
Download Paper
