Warm-start interaction state aligned with history-conditioned policies
VectorWorld is a streaming vector-graph world model for closed-loop autonomous-driving simulation. It warm-starts history-conditioned policies, outpaints frontier tiles in real time, and maintains reactive traffic over kilometer-scale rollout.
Interface
Warm-start interaction state
Motion-aware gated VAE for history-conditioned policies at rollout start.
Generator
≈ 6 ms / 64 m × 64 m tile
One-step MeanFlow enables real-time masked completion during rollout.
Closed loop
1 km+ rollout
DeltaSim keeps NPC behavior reactive and physically aligned over long horizons.
Closed-loop constraints
History-free initialization
Static snapshots mismatch the inputs expected by history-conditioned policies.
Multi-step sampling latency
Repeated rollout-time generation breaks the real-time budget when many solver steps are required.
Long-horizon feasibility drift
Small topology and kinematic errors compound over kilometer-scale rollout.
VectorWorld contributions
Motion-aware interaction-state VAE
Warm-start interface aligned with history-conditioned policies.
Edge-gated relational DiT + MeanFlow/JVP
One-step masked completion on heterogeneous vector graphs.
DeltaSim
Physics-aligned NPC policy for long-horizon rollout stability.
A concise walkthrough of the full system
Problem setting, method, and main results in one video.
A deployment-oriented three-component stack
Warm-start interface → one-step relational generation → physics-aligned closed loop.
One stack, three jobs: initialize interaction state, outpaint the frontier in one step, and keep NPC behavior feasible over long horizons.
Motion-aware interaction-state VAE
Warm-start interface for history-conditioned policies.
- Selective motion gating.
- Policy-compatible pseudo-history.
- Lower early-horizon jerk.
Edge-gated relational DiT + MeanFlow/JVP
One-step masked completion on heterogeneous vector graphs.
- Edge-aware relational attention.
- MeanFlow + JVP for one-step transport.
- Streaming frontier outpainting.
DeltaSim
Physics-aligned NPC policy for kilometer-scale rollout.
- Hybrid discrete–continuous actions.
- Differentiable kinematic shaping.
- Lower compounding drift.
Initialization quality on Waymo and nuPlan
VectorWorld improves lane continuity, route validity, and agent-map consistency at initialization.
nuPlan endpoint gap
0.078 m
Smaller lane-transition error than ScenDream-L (0.250 m).
nuPlan init collision
3.01%
Lower initial collision rate than ScenDream-L (9.30%).
Waymo FD
0.94
Best non-privileged perceptual score reported in the paper.
One-step generation in the real-time regime
Matched clip progress across the multi-step baseline, few-step MeanFlow, and the one-step deployment point.
Representative clips
Case 1
matched rollout progressFrontier completion under the strict one-step deployment budget.
Playback speed
Generator
One-step MeanFlow + JVP
Solver-free masked completion for repeated rollout-time generation.
Deployment cost
5.6 ms / 64 m × 64 m tile
The online operating point reported in the paper.
Step budget
3–5 steps
Higher fidelity when a small offline budget.
What to inspect
- Lane continuity at the frontier.
- Route continuation under a tight step budget.
- Agent-map consistency during low-latency generation.
Reading guide
One-step MeanFlow is the deployment point. Few-step flow recovers fidelity when a small extra budget is allowed. The multi-step baseline remains visibly slower.
Long-horizon rollout evidence and exported scene inspection
Closed-loop demos, key rollout metrics, and a lightweight viewer for exported vector scenes.
Warm start
16.6 → 9.6 jerk
Warm-started interaction states reduce early-horizon instability in closed loop.
Horizon
1 km+ rollout
Streaming outpainting extends evaluation well beyond a single initialized tile.
Training value
25.7% → 56.0%
PPO success improves after retraining inside the VectorWorld environment.
3D Inspection
Orbit, pan, and zoom the exported vector scene in 3D
- 🖱️ Left drag → orbit
- 🖱️ Right drag → pan
- 🖱️ Scroll → zoom (no page scroll)
- 🖱️ Double-click → fit view
Layers
agent_motion trail, not a full per-step replay. Drag to orbit, scroll to zoom, right-drag to pan.Rollout case 1
Frontier outpainting
Long-horizon rollout with repeated frontier completion ahead of the ego.
Rollout case 2
Stable closed loop
Second rollout showing stable route continuation under a different map layout.
VectorWorld as a controllable layout prior for surround-view video generation
The generated vector scene provides explicit layout control for downstream surround-view video generation.
Cases
Vector-to-sensor bridge
VectorWorld supplies an explicit vector-world prior. After projection into image space, a downstream video model can generate sensor-level surround-view videos with stronger geometry, motion, and interaction consistency for autonomous driving.
Stage 1 · Vector world
World priorExplicit vector scene generated by VectorWorld.
Stage 2 · Sensor-space projection
BridgeProjection that exposes the structural prior in image space.
Stage 3 · Sensor-level video
Video modelSurround-view video conditioned on the projected structural prior.
Citation
If you find VectorWorld useful, please cite the paper.
BibTeX
Copy directly into your bibliography file.
@misc{jiang2026vectorworldefficientstreamingworld,
title={VectorWorld: Efficient Streaming World Model via Diffusion Flow on Vector Graphs},
author={Chaokang Jiang and Desen Zhou and Jiuming Liu and Kevin Li Sun},
year={2026},
eprint={2603.17652},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.17652},
}