Production Project

Road Preview: Surface-Element Segmentation

Robust segmentation of small road elements — manhole covers and speed bumps — for the road-preview / 'magic-carpet' suspension feature, hardened against hard cases and quantized for TDA4 edge deployment.

Timeline
2023.05–2023.11
Context
BYD
Role
Perception Model Optimization Engineer
Stage
Deployment-oriented · near-SOP

Overview

What is this project about?

A road-surface perception project for the road-preview ('magic-carpet') suspension feature: segment safety-critical small road elements — manhole covers and speed bumps — reliably under hard real-world conditions (tiny targets, water and oil stains, textureless surfaces), then compress and quantize the model to INT8 for efficient TDA4 edge inference, reaching an initial mass-production quality bar.

production perception deployment

The Challenge

Why small road-element segmentation is hard

The road-preview feature looks ahead and segments small surface elements so the suspension can pre-adjust. The targets are tiny and the visual conditions are adversarial — three failure modes dominate.

Tiny road element in a full driving scene
Full-scene context — the target is a small fraction of the frame.
Close-up of a manhole cover
Manhole cover — often textureless or road-colored.
Water-stain patch on the road
Water / oil stain — texture mimics a real cover.
Low-contrast road element
Low contrast — element blends into the asphalt.
#ChallengeWhy it breaks naive models
01 Extremely small targets Manhole covers and speed bumps can occupy less than 1% of the image pixels, so the signal is easily lost to pooling and down-sampling.
02 Stain / texture confusion Water and oil stains on the road produce textures highly similar to a real manhole cover, driving false positives.
03 Textureless / color-matched Many covers are textureless or nearly the same color as the asphalt, driving missed detections.

Stage 1 · Train (FP32)

Segmentation training pipeline

A classic Encoder–Decoder design, tuned end-to-end for TDA4 edge deployment. The flow runs data → features → decoding → loss → optimization → evaluation, looped each epoch.

RegNet backbone FPN EdgeAI-Lite decoder OHEM · Lovász · Tversky · Focal AdamW · mIoU
Segmentation training pipeline architecture
Training pipeline. Data loading → augmentation → RegNet backbone → FPN EdgeAI-Lite decoder → combined losses; AdamW optimizes and mIoU evaluates. The blue arcs are the per-epoch training loop. This stage produces a high-accuracy floating-point model.
Left

Data → feature extraction

ModuleRole
Data LoadingReads images and their pixel-level annotation masks; supports multiple segmentation dataset formats.
Data AugmentationThe train pipeline applies strong augmentation (random crop / flip / color jitter); the test pipeline only normalizes. The two are strictly separated to avoid data leakage.
Backbone · RegNetA regularized network found by design-space search — high compute efficiency and accuracy, ideal for edge deployment (low MAC, low memory). Emits multi-scale feature maps (C2–C5, ResNet-style).
Center

Decoding → losses

ModuleRole
Decoder · FPNEdgeAILiteDecoderA lightweight FPN-based decoder built for EdgeAI (TI EdgeAI Toolbox): fuses multi-scale features, upsamples back to full resolution, and outputs a per-pixel class-probability map.
Loss · OHEMOnline Hard Example Mining — dynamically selects high-loss hard samples for back-propagation to mitigate class imbalance.
Loss · LovászA differentiable surrogate that directly optimizes IoU, aligned with the mIoU metric; outperforms pure cross-entropy on segmentation.
Loss · TverskyA generalization of Dice loss; α/β control the false-positive / false-negative trade-off — well suited to small and rare targets.
Loss · FocalDown-weights easy samples and focuses on hard pixels; effective under extreme class imbalance.

The four losses are combined with weights so they complementarily cover the different failure modes above.

Right

Optimization → evaluation

ModuleRole
Optimizer · AdamWAdam with decoupled weight decay — fast convergence and good generalization, robust for CNN/Transformer-style architectures.
Evaluation · mIoUMean Intersection-over-Union — the standard segmentation metric, averaged over all classes for a fair view of long-tail performance.
Training loop (blue arcs)The left arc re-enters Data Loading each epoch to stream new batches; the right arc feeds evaluation back to drive hyper-parameter tuning (LR scheduler) and early stopping.

Stage 2 · Quantize & Compile (INT8)

From FP32 ONNX to a TDA4 TIDL binary

The trained floating-point model is compiled and quantized into a TDA4-runnable TIDL binary using TI's TIDL Model Import toolchain — three inputs feed a containerized import step.

TDA4 TIDL model import and quantization flow
Quantization & compile flow. Float ONNX + import config + calibration set → TIDL model-import Docker container → quantized TIDL binary. This stage compresses the model to an edge-efficient form.
Inputs

Three inputs to the import step

InputRole
float onnxThe exported FP32 ONNX model — full network structure and FP32 weights.
model import cfgTIDL compile config: quantization bit-width (INT8/INT16), input size, target core (MMA / C7x DSP), and operator-mapping strategy.
calibration setA small set of real images (~100–500) for Post-Training Quantization (PTQ) calibration — collects each layer's activation range (min/max or histogram) to set the quantization scale / zero-point.
Container

Inside the model-import Docker container

Why Docker? The TIDL toolchain depends on a specific TI SDK environment; the container guarantees consistency and avoids dependency conflicts. The dashed arrow means the Docker image is the container's source — provided by TI and pulled on demand.

StepWhat happens
1 · Operator fusionMerges Conv + BN + ReLU to cut memory traffic.
2 · Quant calibrationUses the calibration set to estimate activation distributions and derive INT8 quantization parameters.
3 · Hardware-aware compileMaps operators onto the TDA4 MMA (matrix accelerator) or C7x DSP; unsupported ops fall back to ARM.
Output

TIDL Model Bin

PropertyDetail
ContentsINT8 quantized weights, network topology, and hardware scheduling info.
RuntimeLoaded and executed directly by the TDA4 TIDL Runtime.
EfficiencyAbout 4× smaller than the FP32 ONNX, with much faster inference and significantly lower power.

The full loop

Train → Quantize → Deploy

Stage 1 produces a high-accuracy FP32 model; Stage 2 compresses it into an edge-efficient quantized binary. Together they form a complete closed loop from training to on-vehicle inference.

Train seg_model_arch FP32 segmentation training — RegNet + FPN decoder, combined losses, mIoU.
Quantize TDA4 QAT flow Export ONNX → PTQ INT8 import & compile → TIDL binary (~4× smaller).
Deploy TDA4 on-board TIDL Runtime inference on the vehicle — efficient, low-power edge execution.

Evaluation

Quantized model on bad cases

Per-scene evaluation of the quantized model on representative hard cases — water stains, far-range targets, and complex multi-bump / night scenes.

Water-stain bad-case evaluation
Water-stain scene. Evaluation on a water-stain bad case — the prime source of false positives.
Far-range target evaluation
Far-range targets. Evaluation on distant, small targets — the prime source of missed detections.
Multi-bump and night-scene evaluation
Complex scenes. Evaluation on multi-speed-bump and night scenes — combined difficulty.
Across both hard and easy scenes, the quantized model reaches an initial mass-production quality bar.

Visualizations

On-road inference replays

Selected replay clips — all play automatically and loop.

Hard-failure replay. Re-injection test on complex failure scenes — resolving most false and missed detections.
High-resolution stress test. Generalization under large 1280-input signals on complex, difficult scenes.
Joint post-processing. Combined post-processing that also outputs the target's elevation / height information.
Night & garage generalization. Generalization test on night and underground-garage scenes with heavy water stains.
Confidentiality note. Only high-level model and deployment information is shown. Internal datasets, exact metrics, customer-specific calibration data, and vehicle-integration details are omitted.