Research Project
Controllable Surround-View Driving Generation
A controllable multi-view world model for driving: 3D layout + map + multi-granularity control signals injected into a diffusion process to generate geometrically-consistent 4V / 7V / 11V images and video — for data augmentation and open-loop simulation.
Overview
What is this project about?
Built a controllable surround-view driving generator that compresses 3D boxes and maps into spatial conditions, encodes text / reference frames / lanes / camera calibration into condition tokens, and injects them into a UNet diffusion backbone — producing cross-camera-consistent 4V / 7V / 11V images and video for data augmentation and open-loop simulation, evolving from OpenSora 1.0 + SD 3.5 to a MagicDrive-fused in-house model.
Conditioned diffusion pipeline
Scene replacement for augmentation