PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
Several computer vision tasks require perceiving or interacting with 3D environments and objects therein, making a strong case in favor of 3D deep learning. However, unlike images which are most popularly structured as arrays of pixels, there are multiple 3D representations, e.g., meshes, point clouds, volumetric and boundary representations, RGB-D representation, etc. Of these, point clouds are arguably the closest representation of raw sensor data, and their simplicity of representation makes them a canonical 3D representation, meaning it is easy to convert them to and from other representation forms. The majority of previous work on using deep learning for 3D data has been using CNNs (by projecting the point clouds into 2D images), volumetric CNNs (by applying 3D CNNs on voxelized shapes), spectral CNNs (on meshes), and fully connected networks (by extracting feature vectors from 3D data). These approaches suffer from several shortcomings, such as challenges with data sparsity, high computational cost, inability to extend to tasks beyond shape classification and non-isometric shapes, and limited expressiveness of extracted features. To address these concerns, the authors propose PointNet, a deep neural network architecture which is able to process point clouds directly for various 3D tasks such as shape classification, part segmentation, and scene understanding, and is also robust to input points’ corruption and perturbation. ...