RAFT: Recurrent All-Pairs Field Transforms for Optical Flow
The optical flow estimation task in computer vision is that given two images $\mathcal{I}_1$ and $\mathcal{I}_2$, we want to estimate for each pixel in $\mathcal{I}_1$, where it goes to in $\mathcal{I}_2$. This dense pixel correspondence task is a long-standing problem that has remained largely unsolved because of difficulties including but not limited to shadows, reflections, occlusions, fast moving objects, surfaces with low textures, etc. Traditional approaches for estimating optical flow, which frame it as a hand-crafted optimization problem over the “space of dense displacement fields” between an image pair and with the optimization performed during inference, are limited because of the challenges in hand-crafting the optimization problem. Motivated by these traditional optimization-based approaches, this paper proposes an end-to-end differentiable deep learning (DL)-based architecture called RAFT (Recurrent All-Pairs Field Transforms) for estimating the optical flow. The RAFT architecture comprises of 3 main components: (a) a convolutional feature encoder to extract feature vectors from a pair of images, (b) a correlation layer to construct a 4D correlation volume followed by pooling to produce volumes at multiple lower resolutions, and (c) a gated activation unit based on GRUs to iteratively update a single flow field using values from the correlation volumes. ...