POC Project
One-Stage End-to-End Driving — 8V Pure Vision
From pixels to planning in a single network: 8-camera surround vision → unified BEV → three perception heads → self-supervised future-BEV prediction → a Diffusion-Flow AI planner, all trained end-to-end with no hand-designed 3D-label interface.
Overview
What is this project about?
A one-stage, pure-vision end-to-end driving POC that lifts 8 surround cameras into a single BEV feature, reads three structured perception heads (3D detection, HD map, occupancy) from it, predicts the next-frame BEV under generative scoring, and tokenizes everything into a Diffusion-Flow planner that emits the ego trajectory and neighbouring-agent states — perception, prediction, and planning optimised jointly.
8V surround · pure vision
Unified BEV feature
3 perception heads
Self-supervised future BEV
Diffusion-Flow planner
8VSurround cameras · multi-frame
1 BEVOne shared feature drives every head
3 heads3D detection · HD map · occupancy
One stagePerception + prediction + planning, joint
The architecture, end to end
On real test drives
Why one stage
Lossy 3D interfacesOne shared BEV, no manual hand-off
Errors compound across modulesJoint end-to-end optimisation
Boxes miss long-tail shapesDense 3D occupancy head
A snapshot can't plan aheadPredict next-frame BEV
Forecasts drift off-manifoldFrozen generative critic
Hand-tuned cost functionsDiffusion-Flow generative planner
My role.
Integrated the static-perception, dynamic-perception, and AI-planner components into the single one-model POC, and ran the daily train / eval / visualization loop. Wording is high-level to protect enterprise confidentiality.
Confidentiality note.
Bosch (XC-CN) POC. The architecture is presented at a conceptual, portfolio level; customer data, calibration, training corpora, and quantitative results are intentionally omitted or sanitized.