POC Project

One-Stage End-to-End Driving — 8V Pure Vision

From pixels to planning in a single network: 8-camera surround vision → unified BEV → three perception heads → self-supervised future-BEV prediction → a Diffusion-Flow AI planner, all trained end-to-end with no hand-designed 3D-label interface.

Timeline
2025.12–2026.04
Context
Bosch (XC-CN)
Role
World Models Algorithm Engineer
Stage
POC

Overview

What is this project about?

A one-stage, pure-vision end-to-end driving POC that lifts 8 surround cameras into a single BEV feature, reads three structured perception heads (3D detection, HD map, occupancy) from it, predicts the next-frame BEV under generative scoring, and tokenizes everything into a Diffusion-Flow planner that emits the ego trajectory and neighbouring-agent states — perception, prediction, and planning optimised jointly.

research e2e world-model perception
8V surround · pure vision Unified BEV feature 3 perception heads Self-supervised future BEV Diffusion-Flow planner
8VSurround cameras · multi-frame
1 BEVOne shared feature drives every head
3 heads3D detection · HD map · occupancy
One stagePerception + prediction + planning, joint

Logic map

Pixels to a planned trajectory — one differentiable pass

Hover a node to inspect its input, logic, output, and contribution. Lines show data flow; the planner reads one shared BEV.

The architecture, end to end

One-stage 8V end-to-end driving architecture
One shared BEV feeds three perception heads and a future-BEV forecast; everything is tokenized into a Diffusion-Flow planner that denoises straight into the ego trajectory — no hand-designed 3D hand-off in the loop.

On real test drives

Surround perception (dynamic objects, online map, occupancy) and the generated ego trajectory — produced together by a single end-to-end network.

Why one stage

Lossy 3D interfacesOne shared BEV, no manual hand-off
Errors compound across modulesJoint end-to-end optimisation
Boxes miss long-tail shapesDense 3D occupancy head
A snapshot can't plan aheadPredict next-frame BEV
Forecasts drift off-manifoldFrozen generative critic
Hand-tuned cost functionsDiffusion-Flow generative planner
My role. Integrated the static-perception, dynamic-perception, and AI-planner components into the single one-model POC, and ran the daily train / eval / visualization loop. Wording is high-level to protect enterprise confidentiality.
Confidentiality note. Bosch (XC-CN) POC. The architecture is presented at a conceptual, portfolio level; customer data, calibration, training corpora, and quantitative results are intentionally omitted or sanitized.