Production Project

Road Preview: Surface-Element Segmentation

Robust segmentation of small road elements — manhole covers and speed bumps — for the road-preview / 'magic-carpet' suspension feature, hardened against hard cases and quantized for TDA4 edge deployment.

Timeline: 2023.05–2023.11
Context: BYD
Role: Perception Model Optimization Engineer
Stage: Deployment-oriented · near-SOP

Overview

What is this project about?

A road-surface perception project for the road-preview ('magic-carpet') suspension feature: segment safety-critical small road elements — manhole covers and speed bumps — reliably under hard real-world conditions (tiny targets, water and oil stains, textureless surfaces), then compress and quantize the model to INT8 for efficient TDA4 edge inference, reaching an initial mass-production quality bar.

production perception deployment

The Challenge

Why small road-element segmentation is hard

The road-preview feature looks ahead and segments small surface elements so the suspension can pre-adjust. The targets are tiny and the visual conditions are adversarial — three failure modes dominate.

Tiny road element in a full driving scene — Full-scene context — the target is a small fraction of the frame.

Close-up of a manhole cover — Manhole cover — often textureless or road-colored.

Water-stain patch on the road — Water / oil stain — texture mimics a real cover.

Low-contrast road element — Low contrast — element blends into the asphalt.

#	Challenge	Why it breaks naive models
01	Extremely small targets	Manhole covers and speed bumps can occupy less than 1% of the image pixels, so the signal is easily lost to pooling and down-sampling.
02	Stain / texture confusion	Water and oil stains on the road produce textures highly similar to a real manhole cover, driving false positives.
03	Textureless / color-matched	Many covers are textureless or nearly the same color as the asphalt, driving missed detections.

Stage 1 · Train (FP32)

Segmentation training pipeline

A classic Encoder–Decoder design, tuned end-to-end for TDA4 edge deployment. The flow runs data → features → decoding → loss → optimization → evaluation, looped each epoch.

RegNet backbone FPN EdgeAI-Lite decoder OHEM · Lovász · Tversky · Focal AdamW · mIoU

Segmentation training pipeline architecture

Training pipeline. Data loading → augmentation → RegNet backbone → FPN EdgeAI-Lite decoder → combined losses; AdamW optimizes and mIoU evaluates. The blue arcs are the per-epoch training loop. This stage produces a high-accuracy floating-point model.

Left

Data → feature extraction

Module	Role
Data Loading	Reads images and their pixel-level annotation masks; supports multiple segmentation dataset formats.
Data Augmentation	The train pipeline applies strong augmentation (random crop / flip / color jitter); the test pipeline only normalizes. The two are strictly separated to avoid data leakage.
Backbone · RegNet	A regularized network found by design-space search — high compute efficiency and accuracy, ideal for edge deployment (low MAC, low memory). Emits multi-scale feature maps (C2–C5, ResNet-style).

Center

Decoding → losses

Module	Role
Decoder · FPNEdgeAILiteDecoder	A lightweight FPN-based decoder built for EdgeAI (TI EdgeAI Toolbox): fuses multi-scale features, upsamples back to full resolution, and outputs a per-pixel class-probability map.
Loss · OHEM	Online Hard Example Mining — dynamically selects high-loss hard samples for back-propagation to mitigate class imbalance.
Loss · Lovász	A differentiable surrogate that directly optimizes IoU, aligned with the mIoU metric; outperforms pure cross-entropy on segmentation.
Loss · Tversky	A generalization of Dice loss; α/β control the false-positive / false-negative trade-off — well suited to small and rare targets.
Loss · Focal	Down-weights easy samples and focuses on hard pixels; effective under extreme class imbalance.

The four losses are combined with weights so they complementarily cover the different failure modes above.

Right

Optimization → evaluation

Module	Role
Optimizer · AdamW	Adam with decoupled weight decay — fast convergence and good generalization, robust for CNN/Transformer-style architectures.
Evaluation · mIoU	Mean Intersection-over-Union — the standard segmentation metric, averaged over all classes for a fair view of long-tail performance.
Training loop (blue arcs)	The left arc re-enters Data Loading each epoch to stream new batches; the right arc feeds evaluation back to drive hyper-parameter tuning (LR scheduler) and early stopping.

Stage 2 · Quantize & Compile (INT8)

From FP32 ONNX to a TDA4 TIDL binary

The trained floating-point model is compiled and quantized into a TDA4-runnable TIDL binary using TI's TIDL Model Import toolchain — three inputs feed a containerized import step.

TDA4 TIDL model import and quantization flow

Quantization & compile flow. Float ONNX + import config + calibration set → TIDL model-import Docker container → quantized TIDL binary. This stage compresses the model to an edge-efficient form.

Inputs

Three inputs to the import step

Input	Role
float onnx	The exported FP32 ONNX model — full network structure and FP32 weights.
model import cfg	TIDL compile config: quantization bit-width (INT8/INT16), input size, target core (MMA / C7x DSP), and operator-mapping strategy.
calibration set	A small set of real images (~100–500) for Post-Training Quantization (PTQ) calibration — collects each layer's activation range (min/max or histogram) to set the quantization scale / zero-point.

Container

Inside the model-import Docker container

Why Docker? The TIDL toolchain depends on a specific TI SDK environment; the container guarantees consistency and avoids dependency conflicts. The dashed arrow means the Docker image is the container's source — provided by TI and pulled on demand.

Step	What happens
1 · Operator fusion	Merges Conv + BN + ReLU to cut memory traffic.
2 · Quant calibration	Uses the calibration set to estimate activation distributions and derive INT8 quantization parameters.
3 · Hardware-aware compile	Maps operators onto the TDA4 MMA (matrix accelerator) or C7x DSP; unsupported ops fall back to ARM.

Output

TIDL Model Bin

Property	Detail
Contents	INT8 quantized weights, network topology, and hardware scheduling info.
Runtime	Loaded and executed directly by the TDA4 TIDL Runtime.
Efficiency	About 4× smaller than the FP32 ONNX, with much faster inference and significantly lower power.

The full loop

Train → Quantize → Deploy

Stage 1 produces a high-accuracy FP32 model; Stage 2 compresses it into an edge-efficient quantized binary. Together they form a complete closed loop from training to on-vehicle inference.

Train seg_model_arch FP32 segmentation training — RegNet + FPN decoder, combined losses, mIoU.

Quantize TDA4 QAT flow Export ONNX → PTQ INT8 import & compile → TIDL binary (~4× smaller).

Deploy TDA4 on-board TIDL Runtime inference on the vehicle — efficient, low-power edge execution.

Evaluation

Quantized model on bad cases

Per-scene evaluation of the quantized model on representative hard cases — water stains, far-range targets, and complex multi-bump / night scenes.

Water-stain bad-case evaluation — **Water-stain scene.** Evaluation on a water-stain bad case — the prime source of false positives.

Far-range target evaluation — **Far-range targets.** Evaluation on distant, small targets — the prime source of missed detections.

Multi-bump and night-scene evaluation — **Complex scenes.** Evaluation on multi-speed-bump and night scenes — combined difficulty.

Across both hard and easy scenes, the quantized model reaches an initial mass-production quality bar.

Visualizations

On-road inference replays

Selected replay clips — all play automatically and loop.

Hard-failure replay. Re-injection test on complex failure scenes — resolving most false and missed detections.

High-resolution stress test. Generalization under large 1280-input signals on complex, difficult scenes.

Joint post-processing. Combined post-processing that also outputs the target's elevation / height information.

Night & garage generalization. Generalization test on night and underground-garage scenes with heavy water stains.

Confidentiality note. Only high-level model and deployment information is shown. Internal datasets, exact metrics, customer-specific calibration data, and vehicle-integration details are omitted.