POC Project

End-to-End Driving: 11V + LiDAR Fusion

A sparse-centric end-to-end autonomous-driving stack fusing 11 cameras (7 pinhole + 4 fisheye) with LiDAR. I owned the BEV-fusion CUDA operator and the AI-planner training.

Timeline: 2024 · Collaboration
Context: Lantu
Role: BEV Fusion & AI-Planner Engineer
Stage: Pre-research / POC

Overview

What is this project about?

An end-to-end autonomous-driving system that fuses 11 surround cameras (7 pinhole + 4 fisheye) with LiDAR under a sparse-centric (SparseDrive-style) paradigm. My two core deliverables: a fused BEV-fusion CUDA operator that aligns 11-camera and LiDAR features in a single kernel, and the training of an AI planner that outputs motion and planning in parallel from a shared query decoder.

research e2e perception 3d-4d

11V surround + LiDAR SparseDrive-style sparse stack Fused CUDA operator Detection · tracking · map Parallel motion + planning

11V7 pinhole + 4 fisheye cameras

1 kernelSample, weight, and reduce fused

<3 pts3D gap vs dense BEVDet-style baseline

2 ownedCUDA fusion + AI-planner training

Logic map

11V LiDAR to trajectory

Hover a node to inspect the sparse data path. Green and amber mark my owned modules.

System logic

11V plus LiDAR sparse end-to-end driving pipeline

11-camera + LiDAR input is encoded sparsely, fused in BEV, and decoded into perception, prediction, motion, and planning outputs.

AI planner shared query decoder architecture

Ego and obstacle queries aggregate temporal history, map, multi-view images, and LiDAR BEV, then branch into motion and planning heads.

Fusion operator

LiDAR and 11-camera alignment visualization

The owned CUDA operator projects 3D keypoints into 11 cameras across 4 scales, then bilinear-samples, weights, and reduces aligned features in one pass.

Dense BEV costSparse representation for efficiency

Naive 3 passesOne fused kernel, HBM ×1

Serial planningParallel motion + ego planning

One-way predictionBidirectional, game-aware queries

Turn-around result

End-to-end surround replay: fused sparse perception and the AI planner produce the turn-around trajectory from 11V + LiDAR inputs.

My role. 2024 · Collaboration with Lantu; author owned the fused BEV-fusion CUDA operator and AI-planner training. Details are high-level and sanitized.

Confidentiality note. Only high-level architecture and sanitized visual materials are shown. Customer-specific data, calibration, and internal performance numbers are omitted. The original source listed an inconsistent interval; a neutral '2024 · Collaboration' label is shown instead.