3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling

cvpr icon

Figure 1. Automatic pseudo 3D scene flow labeling and model learning. The input comprises 3D anchor boxes, a pair of point clouds, and their corresponding coarse normal vectors. The optimization of motion parameters primarily updates the bounding box parameters, global motion parameters, local motion parameters, and the motion probability of the box. Parameters needed for box updates are inversely adjusted through six objective functions. Once optimized, the motion parameters simulate K types of motion using a global-local data augmentation module. A single source frame point cloud, along with the generated K sets of motion parameters, produces multiple 3D scene flow label candidates. These label candidates serve in guiding the supervised neural network to learn point-wise motion.

Abstract


Learning 3D scene flow from LiDAR point clouds presents significant difficulties, including poor generalization from synthetic datasets to real scenes, scarcity of real-world 3D labels, and poor performance on real sparse LiDAR point clouds. We present a novel approach from the perspective of auto-labelling, aiming to generate a large number of 3D scene flow pseudo labels for real-world LiDAR point clouds. Specifically, we employ the assumption of rigid body motion to simulate potential object-level rigid movements in autonomous driving scenarios. By updating different motion attributes for multiple anchor boxes, the rigid motion decomposition is obtained for the whole scene. Furthermore, we developed a novel 3D scene flow data augmentation method for global and local motion. By perfectly synthesizing target point clouds based on augmented motion parameters, we easily obtain lots of 3D scene flow labels in point clouds highly consistent with real scenarios. On multiple real-world datasets including LiDAR KITTI, nuScenes, and Argoverse, our method outperforms all previous supervised and unsupervised methods without requiring manual labelling. Impressively, our method achieves a tenfold reduction in EPE3D metric on the LiDAR KITTI dataset, reducing it from $0.190m$ to a mere $0.008m$ error.

The proposed 3D scene flow pseudo-auto-labelling framework. Given point clouds and initial bounding boxes, both global and local motion parameters are iteratively optimized. Diverse motion patterns are augmented by randomly adjusting these motion parameters, thereby creating a diverse and realistic set of motion labels for the training of 3D scene flow estimation models.

Qualitative results


Tip: Clicking on the image allows you to view high-definition PDF visualization images.
Visualization
Registration visualization results of our method (GMSF+3DSFlabelling) and baselines on the LiDAR KITTI and Argoverse datasets. The estimated target point cloud $PC_{sw}$ is derived from warping the source point cloud $PC_{S}$ to the target point cloud via 3D scene flow. The larger the overlap between $PC_{sw}$ (blue) and the target point cloud $PC_T$ (green), the higher the predicted accuracy of the scene flow. Local areas are zoomed in for better visibility. Our 3D scene flow estimation notably improves performance.

lidarKITTI - scene flow (Registration visualization)

Green points represent the target frame point cloud.

Pink points: Source frame point cloud + scene flow → target frame.

FLOT+3DSFLabelling

FLOT

lidarKITTI - EPE3D (3D scene flow EPE3D error visualization)

Error Map

FLOT+3DSFLabelling

FLOT

WaymoOpen - scene flow (Registration visualization)

Green points represent the target frame point cloud.

Pink points: Source frame point cloud + scene flow → target frame.

GMSF+3DSFLabelling

GMSF (Trained on Waymo (with SceneFlow GT)

waymoOpen - scene flow (3D scene flow EPE3D error visualization)

Error Map

GMSF+3DSFLabelling

GMSF (Trained on Waymo (with SceneFlow GT)

Argoverse - scene flow (Registration visualization)

Green points represent the target frame point cloud.

Pink points: Source frame point cloud + scene flow → target frame.

MSBRN+3DSFLabelling

MSBRN

nuScenes - scene flow (Registration visualization)

Green points represent the target frame point cloud.

Pink points: Source frame point cloud + scene flow → target frame.

MSBRN+3DSFLabelling

MSBRN

Visual Comparison of the Predicted Target Frames (FLOT and FLoT+3DSELabelling)

Citation


@InProceedings{Jiang_2024_CVPR,
    author    = {Jiang, Chaokang Wang, Guangming Liu, Jiuming Wang, Hesheng Ma, Zhuang Liu, Zhenqiang Liang, Zhujin Shan, Yi Du, Dalong},
    title     = {3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {15173-15183}
}