Project page
VectorWorld

VectorWorld: Efficient Streaming World Model via Diffusion Flow on Vector Graphs

Streaming vector-graph world model for autonomous-driving simulation: warm-started initialization, one-step frontier completion, and kilometer-scale closed-loop rollout.

01

Warm-start interaction state aligned with history-conditioned policies

02

≈ 5.6 ms one-step frontier completion per 64 m × 64 m tile

03

Kilometer-scale closed-loop rollout with reactive traffic

04

Controllable vector prior for surround-view video generation

VectorWorld is a streaming vector-graph world model for closed-loop autonomous-driving simulation. It warm-starts history-conditioned policies, outpaints frontier tiles in real time, and maintains reactive traffic over kilometer-scale rollout.

Teaser
Streaming closed-loop rollout: warm start, one-step frontier completion, and long-horizon stability.

Interface

Warm-start interaction state

Motion-aware gated VAE for history-conditioned policies at rollout start.

Generator

≈ 6 ms / 64 m × 64 m tile

One-step MeanFlow enables real-time masked completion during rollout.

Closed loop

1 km+ rollout

DeltaSim keeps NPC behavior reactive and physically aligned over long horizons.

Closed-loop constraints

History-free initialization

Static snapshots mismatch the inputs expected by history-conditioned policies.

Multi-step sampling latency

Repeated rollout-time generation breaks the real-time budget when many solver steps are required.

Long-horizon feasibility drift

Small topology and kinematic errors compound over kilometer-scale rollout.

VectorWorld contributions

Motion-aware interaction-state VAE

Warm-start interface aligned with history-conditioned policies.

Edge-gated relational DiT + MeanFlow/JVP

One-step masked completion on heterogeneous vector graphs.

DeltaSim

Physics-aligned NPC policy for long-horizon rollout stability.

Video overview

A concise walkthrough of the full system

Problem setting, method, and main results in one video.

Overview video
Open video
Method

A deployment-oriented three-component stack

Warm-start interface → one-step relational generation → physics-aligned closed loop.

System figure
Open figure
Closed loop at a glance: initialize the base tile, outpaint the frontier ahead of the ego, and keep reactive NPC behavior stable during rollout.
Deployment loop Warm start One-step completion Physics-aligned closed loop

One stack, three jobs: initialize interaction state, outpaint the frontier in one step, and keep NPC behavior feasible over long horizons.

Component 1 · Interface 01

Motion-aware interaction-state VAE

Warm-start interface for history-conditioned policies.

  • Selective motion gating.
  • Policy-compatible pseudo-history.
  • Lower early-horizon jerk.
Click figure to zoom in page Open PDF
Component 2 · Generator 02

Edge-gated relational DiT + MeanFlow/JVP

One-step masked completion on heterogeneous vector graphs.

  • Edge-aware relational attention.
  • MeanFlow + JVP for one-step transport.
  • Streaming frontier outpainting.
Click figure to zoom in page Open PDF
Component 3 · Closed loop 03

DeltaSim

Physics-aligned NPC policy for kilometer-scale rollout.

  • Hybrid discrete–continuous actions.
  • Differentiable kinematic shaping.
  • Lower compounding drift.
Click figure to zoom in page Open PDF
Initialization quality

Initialization quality on Waymo and nuPlan

VectorWorld improves lane continuity, route validity, and agent-map consistency at initialization.

nuPlan endpoint gap

0.078 m

Smaller lane-transition error than ScenDream-L (0.250 m).

nuPlan init collision

3.01%

Lower initial collision rate than ScenDream-L (9.30%).

Waymo FD

0.94

Best non-privileged perceptual score reported in the paper.

Comparison figure
Open figure
Representative initialization comparison on Waymo and nuPlan. VectorWorld yields more continuous lanes, smaller lane-transition gaps, and fewer agent-map inconsistencies than prior vectorized generators.
Efficiency

One-step generation in the real-time regime

Matched clip progress across the multi-step baseline, few-step MeanFlow, and the one-step deployment point.

Representative clips

Case 1 · 1 / 3

Case 1

matched rollout progress

Frontier completion under the strict one-step deployment budget.

Playback speed

Shared progress0%
shared progress controlreference clip
Multi-step baseline

ScenDream

Latent diffusion baseline.

Open
clip current
MeanFlow 3–5 step

VectorWorld few-step

Higher fidelity when a small offline budget.

Open
clip current
Deployment point

VectorWorld one-step

Solver-free + JVP for online completion.

Open
clip current

Generator

One-step MeanFlow + JVP

Solver-free masked completion for repeated rollout-time generation.

Deployment cost

5.6 ms / 64 m × 64 m tile

The online operating point reported in the paper.

Step budget

3–5 steps

Higher fidelity when a small offline budget.

What to inspect

  • Lane continuity at the frontier.
  • Route continuation under a tight step budget.
  • Agent-map consistency during low-latency generation.

Reading guide

One-step MeanFlow is the deployment point. Few-step flow recovers fidelity when a small extra budget is allowed. The multi-step baseline remains visibly slower.

Latency-quality figure
Open figure
Figure 6 in the paper. The key deployment question is not only absolute quality, but which operating point stays within the streaming budget while keeping endpoint distance and agent JSD controlled.
Closed loop

Long-horizon rollout evidence and exported scene inspection

Closed-loop demos, key rollout metrics, and a lightweight viewer for exported vector scenes.

Warm start

16.6 → 9.6 jerk

Warm-started interaction states reduce early-horizon instability in closed loop.

Horizon

1 km+ rollout

Streaming outpainting extends evaluation well beyond a single initialized tile.

Training value

25.7% → 56.0%

PPO success improves after retraining inside the VectorWorld environment.

0 / 0

3D Inspection

Orbit, pan, and zoom the exported vector scene in 3D

  • 🖱️ Left drag → orbit
  • 🖱️ Right drag → pan
  • 🖱️ Scroll → zoom (no page scroll)
  • 🖱️ Double-click → fit view

Layers

Motion trail progress35%
3D interactive canvasDouble-click to fit
Loading vector scene…
Lane bandRouteMotion trailEgoVehiclePedestrianCyclist
Note. Exported scene inspection only: one scene snapshot plus a qualitative agent_motion trail, not a full per-step replay. Drag to orbit, scroll to zoom, right-drag to pan.

Rollout case 1

Frontier outpainting

Long-horizon rollout with repeated frontier completion ahead of the ego.

Rollout case 2

Stable closed loop

Second rollout showing stable route continuation under a different map layout.

Surround-view generation

VectorWorld as a controllable layout prior for surround-view video generation

The generated vector scene provides explicit layout control for downstream surround-view video generation.

Cases

Case 1 · 1 / 4

Vector-to-sensor bridge

VectorWorld supplies an explicit vector-world prior. After projection into image space, a downstream video model can generate sensor-level surround-view videos with stronger geometry, motion, and interaction consistency for autonomous driving.

Stage 1 · Vector world

World prior

Explicit vector scene generated by VectorWorld.

Stage 2 · Sensor-space projection

Bridge

Projection that exposes the structural prior in image space.

Stage 3 · Sensor-level video

Video model

Surround-view video conditioned on the projected structural prior.

Takeaway. Vector scene → projected layout → sensor-level video. The vector world model keeps structure explicit, and the video model lifts it to a more physical, controllable surround-view world.
BibTeX

Citation

If you find VectorWorld useful, please cite the paper.

BibTeX

Copy directly into your bibliography file.

@misc{jiang2026vectorworldefficientstreamingworld,
  title={VectorWorld: Efficient Streaming World Model via Diffusion Flow on Vector Graphs},
  author={Chaokang Jiang and Desen Zhou and Jiuming Liu and Kevin Li Sun},
  year={2026},
  eprint={2603.17652},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2603.17652},
}