Building Rome in A Day

With the advent of digital photography and the popularity of cloud-based digital image sharing websites, there has been a huge proliferation in the number of publicly accessible photographs of popular cities (and landmarks thereof) across the world. As a result, the ability to leverage these photos in a meaningful manner is of massive interest in computer vision communities. One key research area that could immensely benefit from it is city-scale 3D reconstruction. Traditionally, existing systems for this task have relied on images and data acquired in a structured manner, making computation simple. On the contrary, images uploaded on the internet have no such constraints, necessitating the development of algorithms which can work on “extremely diverse, large, and unconstrained image collections”. Building upon previous research and incorporating elements from other disciplines of computer science, this paper proposes a system to construct large-scale 3D geometry from large and unorganized image collections publicly available on the internet, with the ability to process more than a hundred thousand images in a day. ...

October 19, 2020 · 5 min · Kumar Abhishek

KinectFusion: Real-Time Dense Surface Mapping and Tracking

The surge of interest in augmented and mixed reality applications can at least in part be attributed to research in the “real-time infrastructure-free” tracking of a camera with the simultaneous generation of detailed maps of physical scenes. While computer vision research has enabled this (especially accurate camera tracking and dense scene surface reconstructions) using structure from motion and multiview stereo algorithms, they are not quite well suited for either real-time applications or detailed surface reconstructions. There has also been a contemporaneous improvement of camera technologies, especially depth-sensing cameras based on time-of-flight or structured light sensing, such as Microsoft Kinect, a consumer-grade offering. Microsoft Kinect features a structured light-based depth sensor (sensor hereafter) and generates a 11-bit $640 \times 480$ depth map at 30Hz using an on-board ASIC. However, these depth images are usually noisy with ‘holes’ indicating regions where depth reading was not possible. This paper proposes a system to process these noisy depth maps and perform real-time (9 million new point measurements per second) dense simultaneous localization and mapping (SLAM), thereby generating an incremental and consistent 3D scene model while also tracking the sensor’s motion (all 6 degrees-of-freedom) through each frame. While the paper presents quite an involved description of the method, the key components have been briefly summarized here. ...

October 19, 2020 · 4 min · Kumar Abhishek