avatar

Chaokang jiang

CV Algorithm Engineer
PhiGent Robotics
jck98@foxmail.com


About Me

I was born in 1998 and currently work as a Computer Vision (CV) Algorithm Engineer at PhiGent Robotics in Beijing. I obtained my Master’s degree in Engineering in 2023, during which I was jointly trained by China University of Mining and Technology and Shanghai Jiao Tong University. I was fortunate to be guided by Professor Hesheng Wang and to collaborate with my senior, Dr. Guangming Wang from the University of Cambridge. At the IRMV Lab, I participated in numerous engineering projects, such as lawn mowing robots and 4D automatic annotation, which significantly enhanced my practical engineering skills.

Research Interests

News

Publications

  1. NeurIPS 2024
    Chaokang Jiang, Guangming Wang, Jiuming Liu, Hesheng Wang, Zhuang Ma, Zhenqiang Liu, Zhujin Liang, Yi Shan, Dalong Du
    NeurIPS 2024

  2. arXiv
    Jiuming Liu, Jinru Han, Lihao Liu, Angelica I Aviles-Rivero, Chaokang Jiang, Zhe Liu, Hesheng Wang
    arXiv

  3. CVPR 2024
    Chaokang Jiang, Guangming Wang, Jiuming Liu, Hesheng Wang, Zhuang Ma, Zhenqiang Liu, Zhujin Liang, Yi Shan, Dalong Du
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  4. CVPR 2024
    Jiuming Liu, Guangming Wang, Weicai Ye, Chaokang Jiang, Jinru Han, Zhe Liu, Guofeng Zhang, Dalong Du, Hesheng Wang
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  5. ICCV 2023
    Jiuming Liu, Guangming Wang, Chaokang Jiang, Zhe Liu, Hesheng Wang
    IEEE/CVF International Conference on Computer Vision (ICCV), 2023

  6. AAAI 2023
    Jiuming Liu, Guangming Wang, Chaokang Jiang, Zhe Liu, Hesheng Wang
    AAAI Conference on Artificial Intelligence 2023

  7. arXiv
    Chaokang Jiang, Guangming Wang, Jinxing Wu, Yanzi Miao, Hesheng Wang
    arXiv

  8. TIM
    Yanzi Miao, Huiying Deng, Chaokang Jiang, Zhiheng Feng, Xinrui Wu, Guangming Wang, Hesheng Wang
    IEEE Transactions on Instrumentation and Measurement

  9. arXiv
    Guangming Wang, Zhiheng Feng, Chaokang Jiang, Hesheng Wang
    arXiv

  10. TII
    Chaokang Jiang; Guangming Wang; Yanzi Miao; Hesheng Wang
    IEEE Transactions on Industrial Informatics June 2023

  11. AIS
    Guangming Wang, Chaokang Jiang, Zehang Shen, Yanzi Miao, Hesheng Wang* (*Corresponding authors)
    Advanced Intelligent Systems

Projects

Autonomous Intelligent Lawn Mower Robot (2021-09 ~ 2022-11)

Collaboration Project between SJTU IRMV & Positec Technology

Lawn mower going up and downhill
Lawn mower going up and downhill
Lawn mower going up and downhill
Lawn mower going up and downhill
Forward and backward 3D obstacle detection
3D obstacle detection
2d-3d fusion 3D obstacle detection
2D detection and 3D clustering fusion
2d-3d fusion 3D obstacle detection
2D segmentation and 3D clustering fusion
Safety 2DBEV detection
2D bird's eye view detection for safety

Responsibilities:

  • Developed slope detection for lawn mowers using radar and depth camera, including point cloud clustering, segmentation, fitting, and detection techniques
  • Developed 3D obstacle detection using point cloud and depth camera on the TX2 platform for various complex scenarios in flat and grassy environments
  • Designed color texture feature encoding methods and efficient fusion with dense 3D point cloud features

The lawn mower robot project encompasses AI vision obstacle classification and recognition, multi-sensor offline and online calibration, visual-inertial odometry, global pose estimation based on multi-sensor fusion, obstacle detection techniques using LiDAR and depth cameras, and multi-sensor fusion for 3D obstacle detection.

Multimodal Fusion Technology in Autonomous Intelligent Lawn Mowing Robots (2021-09 ~ 2023-03)

Collaboration Project between SJTU IRMV & Positec Technology

FFPA-Net Efficient Feature Fusion with Projection Awareness for 3D Object Detection
FFPA-Net Efficient Feature Fusion with Projection Awareness for 3D Object Detection
The Main Network Structures of The Bi-modality Fusion Pipeline S&H-Fusion
The Main Network Structures of The Bi-modality Fusion Pipeline S&H-Fusion

Responsibilities:

  • A new data pre-processing method is proposed to achieve more efficient fusion of the different signal features. The indexes between different sensor signals are established in advance and stored in a map, while synchronized sampling provides fast and accurate query correspondence for feature fusion.
  • Multiple methods are explored to achieve cross-modal feature fusion more reasonably and efficiently, including soft query weights with perceiving the Euclidean distance of bimodal features, and fusion modules based on dual attention correlating the geometric features and texture features of the scene.
  • A bi-modality feature fusion module with both hard and soft components is proposed, which guides the network to refine more accurate 3D positions and orientations of objects in the second stage. The proposed method achieves advanced performance on the nuScenes dataset, especially demonstrating powerful performance for small object detection with degraded image quality and objects with few LiDAR signals.

This is a preliminary research project aimed at exploring data-driven methods for image and LiDAR fusion. We focused on two 2D-3D fusion approaches to address the high latency and low accuracy of current 3D obstacle detection models. Our goal is to accelerate the practical application of these models.

Integrated Network for Perception, Planning and Decision-making (2021-08 ~ 2022-10)

Independent Innovation Joint Fund Project of The Future Laboratory of The Second Aerospace Academy

Comprehensive demonstration of reconnaissance and strike missions in a simulated environment
Comprehensive demonstration of reconnaissance and strike missions in a simulated environment
Integrated Neural Network Technology Research for Perception, Planning, and Decision-making
Integrated Neural Network Technology Research for Perception, Planning, and Decision-making
Monocular Visual SLAM Real-time Dense Map Construction
Monocular Visual SLAM Real-time Dense Map Construction
Monocular Visual SLAM Real-time Dense Map Construction
Monocular Visual SLAM Real-time Dense Map Construction

Responsibilities:

  • Developed perception tasks in simulation environments by integrating the Webots robot simulator with deep learning for 2D tracking and detection.
  • Deployed depth estimation and mapping in SLAM models, and 3D semantic segmentation models.
  • Deployed real-time dense mapping for monocular visual SLAM on ROS.

The project focuses on designing an integrated reinforcement learning network encompassing perception, planning, and decision-making. The perception module includes depth estimation, semantic segmentation, odometry estimation, loop closure detection, dense mapping, and object detection and tracking.

Video Offline 4D Automatic Labelling (2022-11 ~ 2023-04)

Collaboration Project between Hozonauto and SJTU IRMV

The Basic Framework of 4D Automatic Labelling Scheme
The Basic Framework of 4D Automatic Labelling Scheme
Automatic Labelling Scheme for Ground Static Elements
Automatic Labelling Scheme for Ground Static Elements

Responsibilities:

  • Perception team leader, responsible for reporting, summarizing, and controlling the phased progress of each sub project. Timely follow up and assist in resolving technical issues encountered.

Imitating Tesla’s 4D automatic labeling solution, the process includes the following parts, input signal processing, standard scene perception tasks, ground element reconstruction and labeling, static scene reconstruction and 3D dynamic object labeling, overall 4D automatic labeling summary, badcase simulation.

Road Preview - Disparity Estimation (2023-05 ~ 2023-11)

Collaboration Project with BYD

Road preview disparity estimation Network
Road preview disparity estimation using optimized neural networks
Visualization of road preview depth estimation
Visualization of road preview depth estimation

Responsibilities:

  • Optimized the NDR-Ne pseudo disparity estimation model, improving data clarity, preprocessing, and augmentation.
  • Enhanced the compressed network model for better efficiency, accuracy, and robustness.
  • Deployed the model on the TDA4-AL board, addressing various bad cases and improving overall model robustness.

This project focuses on road preview and disparity estimation tasks, including improving pseudo disparity estimation models, optimizing neural networks for real-time performance, and deploying robust solutions on embedded platforms.

Road Segmentation and Preview (2023-05 ~ 2024-03)

Collaboration Project with BYD

Road segmentation case example 1
Road segmentation for small-scale objects like speed bumps and manhole covers
Road segmentation case example 2
Handling challenging scenarios like shadowed and wet roads
Neural network architecture for road segmentation
Optimized neural network for road segmentation and preview tasks
Road segmentation visualization
Visualization of road segmentation results on test data

Responsibilities:

  • Designed and optimized the segmentation model to handle small-scale objects (e.g., speed bumps) in large-scale road environments.
  • Improved feature extraction and fusion for handling texture-similar backgrounds and objects in challenging lighting or occlusion scenarios.
  • Integrated advanced loss functions like OhemCrossEntropy, Lovasz, and RMI Loss to improve segmentation accuracy and boundary precision.
  • Deployed the segmentation model on the TDA4-AL platform, achieving real-time performance and robustness in practical scenarios.
  • Collaborated with teams to address challenging cases (e.g., shadows, water stains, and complex lighting), significantly improving model reliability.
  • Conducted quantization-aware training (QAT) to optimize the model for deployment on embedded systems without sacrificing performance.

This project focuses on intelligent road segmentation and preview tasks, leveraging cutting-edge neural networks to handle challenges such as small object detection, complex backgrounds, and adverse lighting conditions. The project involved end-to-end optimization, from model design and training to deployment on embedded platforms, ensuring real-time performance and robustness. Key features include advanced loss functions, quantization-aware training, and integration with BYD’s intelligent systems for enhanced vehicle control and safety.

3D Motion Estimation - Scene Flow and Deployment (2023-08 ~ 2023-12)

Collaboration Project with PhiGent Robotics

Auto-labeling pipeline for 3D motion estimation
Automated pipeline for generating 3D scene flow labels
Experiment results on public datasets
Performance evaluation of the 3D scene flow estimation network

Responsibilities:

  • Built a high-quality 3D scene flow training dataset using multi-source data to improve model generalization.
  • Designed and maintained two efficient 3D scene flow estimation networks, optimizing for large-scale point cloud data and complex operators.
  • Developed an automatic annotation framework to generate consistent, high-accuracy labels, significantly reducing manual labeling efforts.
  • Conducted extensive experiments on public datasets, demonstrating superior accuracy and computational efficiency compared to baseline methods.
  • Converted models to ONNX format and optimized them for deployment on NVIDIA Orin (TensorRT) and Horizon J6E (Horizon SDK).

This project focuses on building robust 3D motion estimation solutions, including creating advanced 3D scene flow estimation networks, developing an auto-labeling framework, and deploying the models on embedded hardware platforms. The work demonstrates significant improvements in accuracy, efficiency, and real-time performance over traditional methods while addressing challenges in operator compatibility and dataset availability.

4D Auto-Labeling and 3D Pure LiDAR Detection (2024-05 ~ 2023-11)

Internal R&D Project on VRU Recognition and 3D Detection

4D Auto-Labeling System Pipeline
Pipeline for 4D auto-labeling system to enhance VRU detection
Data Augmentation with 3D Object Copy-Paste
Example of 3D object data augmentation to improve model robustness
Geometric Alignment Loss Use Case
Use case for optimizing detection via geometric alignment loss
Auto-labeling Visualization Example
Construction of a pipeline for mining badcase using Pure LiDAR 3D Object Detection Model

Responsibilities:

  • {"Enhanced VRU detection recall rates through"=>[{"Data-level optimizations"=>"3D object augmentation using Copy-Paste techniques, incorporation of LiDAR reflectivity features, and expansion of manually labeled VRU and trailer datasets."}, {"Model-level optimizations"=>"Improved BEV (Bird's Eye View) resolution and adapted network structures for varying perception ranges."}, {"Loss function optimizations"=>"Introduced Geometric Alignment Loss to refine detection box precision."}]}
  • Conducted extensive experiments on internal datasets, showcasing significant improvements in detection accuracy and robustness.
  • {"Designed a 3D pure LiDAR detection model, focusing on"=>["Dataset adaptation and pipeline creation to ensure high-quality annotations.", "Iterative model architecture and parameter optimization to enhance performance.", "Identifying and fixing failure cases by prioritizing and resolving critical issues."]}
  • Established a continuous improvement cycle for bad-case analysis and validation of optimization strategies.

This project integrates a 4D auto-labeling system with an efficient 3D pure LiDAR detection model to enhance VRU (Vulnerable Road User) recognition and detection precision. It addresses challenges in data annotation, improves network robustness through data/model optimization, and deploys cutting-edge solutions for real-world applications. The work emphasizes innovation in data augmentation, loss function design, and system-level analysis to achieve superior detection accuracy and recall rates.

End-to-End Autonomous Driving (LiDAR + 11V Fusion) Project (2024-06 ~ 2023-09)

Collaboration Project with lantu

End-to-End Autonomous Driving Visualization
Visualization of end-to-end autonomous driving system with LiDAR and 11V fusion
End-to-End Autonomous Driving Architecture 1
System architecture for LiDAR and multi-camera fusion in autonomous driving

Responsibilities:

  • Developed pipelines for data reading, processing, and format conversion, ensuring seamless integration of LiDAR and camera data.
  • Implemented LiDAR motion distortion correction using global pose transformation matrices and timestamp synchronization.
  • Designed and optimized LiDAR-camera fusion for obstacle detection, leveraging sparse anchors and map instances to enhance multi-modal feature aggregation and improve perception accuracy.

This project aims to achieve an end-to-end autonomous driving system by fusing LiDAR data with 11 multi-view cameras. The system outputs high-precision perception, prediction, planning, and control information, enabling enhanced autonomous driving performance in complex environments.

Controlled Data Generation for Traffic Scenarios (2023-05 ~ 2023-11)

Collaboration Project with BYD

Generation Pipeline
Controlled data generation pipeline with ControlNet and Stable Diffusion
Rainy Scene Generation
Rainy scene generation using Our autoregressive video generation framework

Responsibilities:

  • Developed a **single-frame generation module** supporting multi-view (4 fisheye, 7V, 11V) data creation and editing with integrated 3D bounding boxes and map labels, optimizing algorithms for better efficiency and completeness.
  • Designed a **video generation module** using autoregressive modeling to produce high-quality continuous frames with temporal consistency, integrating DDIM for faster inference and improved visual quality.
  • Supported conditional video generation based on text prompts or offline BEV traffic flow, ensuring realistic and diverse driving scenarios.
  • Leveraged ControlNet to align temporal conditions and latent features, enabling controlled multi-view surround video generation in a single pass.
  • Achieved automated generation for multi-view data, supporting 4 fisheye, 7V, and 11V formats, with enhanced temporal coherence and cross-view consistency.

This project focuses on generating diverse and temporally consistent traffic scenario videos using advanced generative models such as LLM-enhanced world models and latent diffusion models. The system supports multi-view video generation under various conditions (e.g., weather, time of day) and ensures adherence to traffic rules, providing valuable data for autonomous driving and traffic planning.

Services

Conference Reviewers

Journal Reviewers


Powered by Jekyll and Minimal Light theme.