Research Platform
Vector Traffic Generation & Sensor-Level Closed-Loop Simulation
Two halves of a controllable driving simulator: a structure-aware temporal vector world model that compresses and generates traffic as latents, and a sensor-level closed-loop pipeline that reconstructs, populates, and re-renders photorealistic surround video.
Overview
What is this project about?
Built a two-level controllable driving simulator: a structure-aware temporal vector VAE (STAR-AE) that compresses sparse, variable agents and lanes into fixed latents, a conditional latent-diffusion generator (STRIDENet) that produces history-consistent future traffic, and a sensor-level closed-loop WorldSim that fuses Gaussian-Splatting reconstruction, traffic-flow generation, and a mask-guided DiT video editor (built on MagicDrive-V2) into photorealistic surround rollouts.
Temporal vector AE in motion
The VAE encodes sparse, variable scenes into a fixed latent and reconstructs them — agents and lanes stay temporally coherent.
Architecture, both halves
Sensor-level closed loop
Mask-guided DiT — edit, don't regenerate
Four semantic masks partition every frame so the model only computes what must change.
| Mask | Region | Action |
|---|---|---|
| M_keep | Known background | Frozen — skip all compute |
| M_ctx | Reference background | Cached — provide K/V only |
| M_edge | Fg/bg boundary | Active — repair the seam |
| M_gen | Foreground | Active — generate by condition |