POC Project

End-to-End Driving: 11V + LiDAR Fusion

A sparse-centric end-to-end autonomous-driving stack fusing 11 cameras (7 pinhole + 4 fisheye) with LiDAR. I owned the BEV-fusion CUDA operator and the AI-planner training.

Timeline
2024 · Collaboration
Context
Lantu
Role
BEV Fusion & AI-Planner Engineer
Stage
Pre-research / POC

Overview

What is this project about?

An end-to-end autonomous-driving system that fuses 11 surround cameras (7 pinhole + 4 fisheye) with LiDAR under a sparse-centric (SparseDrive-style) paradigm. My two core deliverables: a fused BEV-fusion CUDA operator that aligns 11-camera and LiDAR features in a single kernel, and the training of an AI planner that outputs motion and planning in parallel from a shared query decoder.

research e2e perception 3d-4d
11V surround + LiDAR SparseDrive-style sparse stack Fused CUDA operator Detection · tracking · map Parallel motion + planning
11V7 pinhole + 4 fisheye cameras
1 kernelSample, weight, and reduce fused
<3 pts3D gap vs dense BEVDet-style baseline
2 ownedCUDA fusion + AI-planner training

Logic map

11V LiDAR to trajectory

Hover a node to inspect the sparse data path. Green and amber mark my owned modules.

System logic

11V plus LiDAR sparse end-to-end driving pipeline
11-camera + LiDAR input is encoded sparsely, fused in BEV, and decoded into perception, prediction, motion, and planning outputs.
AI planner shared query decoder architecture
Ego and obstacle queries aggregate temporal history, map, multi-view images, and LiDAR BEV, then branch into motion and planning heads.

Fusion operator

LiDAR and 11-camera alignment visualization
The owned CUDA operator projects 3D keypoints into 11 cameras across 4 scales, then bilinear-samples, weights, and reduces aligned features in one pass.
Dense BEV costSparse representation for efficiency
Naive 3 passesOne fused kernel, HBM ×1
Serial planningParallel motion + ego planning
One-way predictionBidirectional, game-aware queries

Turn-around result

End-to-end surround replay: fused sparse perception and the AI planner produce the turn-around trajectory from 11V + LiDAR inputs.
My role. 2024 · Collaboration with Lantu; author owned the fused BEV-fusion CUDA operator and AI-planner training. Details are high-level and sanitized.
Confidentiality note. Only high-level architecture and sanitized visual materials are shown. Customer-specific data, calibration, and internal performance numbers are omitted. The original source listed an inconsistent interval; a neutral '2024 · Collaboration' label is shown instead.