Latent Action Pretraining Through World Modeling

Published in Submitted to ICRA 2026, 2025

Vision-Language-Action (VLA) models enable robots to follow language instructions but often require large labeled datasets. We propose LAWM, a model-agnostic framework that learns latent actions through world modeling from unlabeled videos.

Supervised by Prof. Ian D. Reid at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI).

Download paper here

Recommended citation: Bahey Tharwat, Yara Nasser, Ali Abouzied, Ian Reid. (2025). "Latent Action Pretraining Through World Modeling." arXiv preprint arXiv:2509.18428.
Download Paper