Figure 1. Automatic pseudo 3D scene flow labeling and model learning. The input comprises 3D anchor boxes, a pair of point clouds, and their corresponding coarse normal vectors. The optimization of motion parameters primarily updates the bounding box parameters, global motion parameters, local motion parameters, and the motion probability of the box. Parameters needed for box updates are inversely adjusted through six objective functions. Once optimized, the motion parameters simulate K types of motion using a global-local data augmentation module. A single source frame point cloud, along with the generated K sets of motion parameters, produces multiple 3D scene flow label candidates. These label candidates serve in guiding the supervised neural network to learn point-wise motion.
Abstract
Learning 3D scene flow from LiDAR point clouds presents significant difficulties, including poor generalization from synthetic datasets to real scenes, scarcity of real-world 3D labels, and poor performance on real sparse LiDAR point clouds. We present a novel approach from the perspective of auto-labelling, aiming to generate a large number of 3D scene flow pseudo labels for real-world LiDAR point clouds. Specifically, we employ the assumption of rigid body motion to simulate potential object-level rigid movements in autonomous driving scenarios. By updating different motion attributes for multiple anchor boxes, the rigid motion decomposition is obtained for the whole scene. Furthermore, we developed a novel 3D scene flow data augmentation method for global and local motion. By perfectly synthesizing target point clouds based on augmented motion parameters, we easily obtain lots of 3D scene flow labels in point clouds highly consistent with real scenarios. On multiple real-world datasets including LiDAR KITTI, nuScenes, and Argoverse, our method outperforms all previous supervised and unsupervised methods without requiring manual labelling. Impressively, our method achieves a tenfold reduction in EPE3D metric on the LiDAR KITTI dataset, reducing it from $0.190m$ to a mere $0.008m$ error.
Qualitative results
Tip: Clicking on the image allows you to view high-definition PDF visualization images.
lidarKITTI - scene flow (Registration visualization)
● Green points represent the target frame point cloud.
● Pink points: Source frame point cloud + scene flow → target frame.