RSS2025 - Structured World Models for Robotic Manipulation

Overview

Physics-based models have been crucial for manipulation, enabling sim-to-real learning, model-predictive control, manipulation planning, and model-based design and verification. However, they typically require extensive manual effort and often fail to capture real-world complexity. Advances in generative modeling—particularly video models—offer a data-driven alternative but struggle with physical plausibility, consistency, and action conditioning. A promising direction is to integrate structured priors with scalable data-driven methods to improve dynamics prediction and generalization across diverse scenarios.

This workshop will explore relevant timely key topics, including state-action representations, supervision sources, generalizable inductive biases, the role of (generative) simulation and video models, and trade-offs in downstream planning, control, policy learning and evaluation.

We will bring together researchers from robotics, machine learning, and computer vision. The workshop targets audiences in manipulation, world modeling, reinforcement learning, and sim-to-real learning. Posters, panels, and live polls will foster debate and cross-level dialogue, allowing attendees to actively contribute to discussions.

Discussion Topics

What should be the representation of state and action?
Where should the supervision (i.e., training data) come from?
- How do we deal with noisy training data, and how do we handle highly occluded scenarios where ground truth states are hard to access?
- What is the place of simulated data?
Is explicit 3D modeling essential? What are the limitations of end-to-end approaches?
What inductive biases should be incorporated into the model, and what are the trade-offs in terms of scalability and generalization?
Is photometric reconstruction necessary? Is it synergetic? If so, how?
What granularity should the model operate on, in terms of both space and time?
What are the pros and cons of leveraging existing foundation models, like video diffusion models?
How do we learn/acquire models for efficient downstream planning or policy learning?
What modalities should be incorporated as inputs?
Can a world model evolve and be learned live during interactions?
What parts of the world are relevant to the model, and how much does accurate dynamics modeling matter?

Invited Speakers

Event Schedule

8:00 - 8:05	Opening Remarks
8:05 - 8:25	Paper Oral Presentations
8:25 - 8:50	*Jerome Revaud: The 3R family: a Foundation Model for 3D vision**
8:50 - 9:15	Katerina Fragkiadaki: From Explicit Physics Engines to Neural Simulators with Generative Models
9:15 - 9:40	Jonathan Tremblay: BUILDING BRIDGES, A KID's DREAM
9:40 - 10:05	Chuang Gan: Virtual Community: A World Simulator for Humans, Robots, and Society
10:05 - 10:30	Yunzhu Li: Simulating and Manipulating Deformable Objects with Structured World Models
10:30 - 11:30	Coffee Break & Poster Sessions
11:30 - 11:55	Rares Ambrus: Structured Large Behavior Models for Dexterous Manipulation
11:55 - 12:25	Panel Discussion
12:25 - 12:30	Awards & Closing Remarks

Accepted Papers

[Oral] FLARE: Robot Learning with Implicit World Modeling
Ruijie Zheng, Jing Wang, Scott Reed, Johan Bjorck, Yu Fang, Fengyuan Hu, Joel Jang, Kaushil Kundalia, Zongyu Lin, Loïc Magne, Avnish Narayan, You Liang Tan, Guanzhi Wang, Qi Wang, Jiannan Xiang, Yinzhe Xu, Seonghyeon Ye, Jan Kautz, Furong Huang, Yuke Zhu, Linxi Fan
[Oral] One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering
Yifan Zhu, Aaron Dollar, Zherong Pan
[Oral] WoMAP: World Models For Embodied Open-Vocabulary Object Localization
Tenny Yin, Zhiting Mei, Tao Sun, Lihan Zha, Miyu Yamane, Emily Zhou, Jeremy Bao, Ola Sho, Anirudha Majumdar
[Oral] DiWA: Diffusion Policy Adaptation with World Models
Akshay L Chandra, Iman Nematollahi, Chenguang Huang, Tim Welschehold, Abhinav Valada
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
Jiangran Lyu, Ziming Li, Xuesong Shi, Chaoyi Xu, Yizhou Wang, He Wang
Multi-Objective Photoreal Simulation (MOPS) Dataset for Computer Vision in Robotic Manipulation
Maximilian Xiling Li, Paul Mattes, Nils Blank, Korbinian Franz Rudolf, Paul Werner Lödige, Rudolf Lioutikov
DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies
Tony Tao, Mohan Kumar Srirama, Jason Jingzhou Liu, Kenneth Shaw, Deepak Pathak
GenParticles: Probabilistic Particle-Based Modeling for Object-Centric Motion
Arijit Dasgupta, Eric Li, Mathieu Huot, William T. Freeman, Vikash Mansinghka, Joshua B. Tenenbaum
Fusing vision and contact-rich physics improves object reconstruction under occlusion
Bibit Bianchini, Minghan Zhu, Mengti Sun, Bowen Jiang, Camillo Jose Taylor, Michael Posa
Diffusion Dynamics Models with Generative State Estimation for Cloth Manipulation
Tongxuan Tian, Haoyang Li, Bo Ai, Xiaodi Yuan, Zhiao Huang, Hao Su
Task-Oriented Grasping, Training-Free, Retrieval, Semantic Alignment, Generative examples
Shailesh, Alok Raj, Nayan Kumar, Priya Shukla, Andrew Melnik, Michael Beetz, Gora Chand Nandi
Learning Dexterous Deformable Object Manipulation Through Cross-Embodiment Dynamics Learning
Zihao He, Bo Ai, Yulin Liu, Weikang Wan, Henrik I Christensen, Hao Su
ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos
Junyao Shi, Zhuolun Zhao, Tianyou Wang, Ian Pedroza, Amy Luo, Jie Wang, Yecheng Jason Ma, Dinesh Jayaraman
Phys2Real: Physically-Informed Gaussian Splatting for Adaptive Sim-to-Real Transfer in Robotic Manipulation
Maggie Wang, Stephen Tian, Jiajun Wu, Mac Schwager

Call for Papers

Submission Portal: OpenReview

We cordially invite paper submissions relevant to the following (non-exhaustive) topics:

Structured Priors for World Modeling
Applications of World Models for Robotic Manipulation
World Models for Policy Learning, Evaluation, and Verification
Model-Based Planning, Control, and Reinforcement Learning
State-Action Representation in World Modeling
Video Models for Robotic Manipulation
3D/4D Reconstruction for Robotic Manipulation
Generative Simulation for Robotic Manipulation
Adaptation and Generalization of World Models
Evaluation and Benchmarking of World Models
Uncertainty and Robustness in World Modeling

Submission Guidelines

Deadline: 11:59pm AOE on May 31, 2025
Page Limit: We welcome submissions of up to 4 pages, with an unlimited number of pages for references and appendices.
Formatting: The authors are encouraged to use the RSS templates for their submissions. Templates from related conferences will also be accepted.
Reviews: Authors from each submitted paper are expected to provide up to 3 reviews for other papers submitted to this workshop.
Anonymity: We follow the double-blind review policy.
Dual Submission: We welcome submissions that are under review or recently accepted for other workshops and/or conferences.
Non-Archival: This workshop is non-archival. Accepted papers will be posted on the workshop website and OpenReview.
Poster Presentation: Authors of accepted papers are expected to present their work in person at the workshop poster session.

Structured World Models
for Robotic Manipulation

Important Notice for Participants & Speakers

Overview

Discussion Topics

Invited Speakers

Chuang Gan

Rares Ambrus

Katerina Fragkiadaki

Jerome Revaud

Jonathan Tremblay

Yunzhu Li

Event Schedule

Accepted Papers

Call for Papers

Submission Portal: OpenReview

Submission Guidelines

Organizers

Wenlong Huang

Jad Abou-Chakra

Alberta Longhini

Kaifeng Zhang

Bardienus Pieter Duisterhof

Kevin Zakka

Zhou Xian

Yunzhu Li

Fei-Fei Li

Structured World Modelsfor Robotic Manipulation

Important Notice for Participants & Speakers

Overview

Discussion Topics

Invited Speakers

Event Schedule

Accepted Papers

Call for Papers

Submission Portal: OpenReview

Submission Guidelines

Organizers

Structured World Models
for Robotic Manipulation