Deep Learning

Deep Convolutional Priors for Indoor Scene Synthesis

Given the importance and the ubiquity of indoor spaces in our everyday lives, the ability to have computer models which can understand, model, and synthesize indoor scenes is of vital importance for many industries such as but not limited to interior design, architecture, gaming, virtual reality, etc. Previous works towards this goal have relied on constrained synthesis of scenes with statistical priors on object pair relationships, “human-centric relationship priors”, or constraints based on “hand-crafted interior design principles”. Moreover, owing to the difficulty of unconstrained room-scale synthesis of indoor scenes, prior work has focused on either small regions within a room or additional inputs (in the form of fixed set of objects, manually specified relationships, natural language description, sketch, or 3D scan of the room) as constraints, and deep generative models such as GANs and VAEs struggle with producing multi-modal outputs. Driven by the success of convolutional neural networks (CNNs) in scene synthesis tasks and the availability of large 3D scene datasets, this paper proposes the first CNN-based autoregressive model to design interior spaces, where given the wall structure and the type of a room, the model predicts the selection and placement of objects. ...

PolyGen: An Autoregressive Generative Model of 3D Meshes

Polygonal meshes are widely used in computer graphics, robotics, and game development to represent virtual objects and scenes. Exisitng learning-based methods for 3D object generation have relied on template models and parametric shape families. Progress with deep learning based approaches has also been limited because meshes are challenging to work with for deep networks, and therefore recent works have instead used alternative representations of object shape, such as voxels, point clouds, occupancy functions, and surfaces. These works, however, leave mesh reconstruction as a post-processing step, leading to inconsistent mesh quality. Drawing inspiration from the success of previous neural autoregressive models applied to sequential raw data (e.g., images, text, and raw audio waveforms) and building upon previously proposed components (e.g., Transformers and pointer networks), this paper presents PolyGen, a neural autoregressive generative model for generating 3D meshes. ...

DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills

The adoption of physically simulated character animation in the industry remains a challenging problem, primarily because of the lack of the directability and generalizability of existing methods. With the goal being to amalgamate data-driven behavior specification with the ability to reproduce such behavior in a physical simulation, there have been several categories of approaches which try to achieve it. Kinematic models rely on large amounts of data, and their ability to generalize to unseen situations can be limiting. Physics-based models incorporate prior knowledge based on the physics of motion, but they do not perform well for “dynamic motions” involving long-term planning. Motion imitation approaches can achieve highly dynamic motions, but are limited by the complexity of the system and lack of adaptability to task objectives. Techniques based on reinforcement learning (RL), although comparatively successful in achieving the defined objectives, often produce unrealistic motion artifacts. This paper addresses these problems by proposing a deep RL-based “framework for physics-based character animation” called DeepMimic by combining a motion-imitation objective with a task objective, allowing it to demonstrate a wide range of motions skills and to adapt to a variety of characters, skills, and tasks by leveraging rich information from the high-dimensional state and environment descriptions, is conceptually simpler than motion imitation based approaches, and can work with data provided in the form of either motion capture clips or keyframed animation. While the paper presents intricate details about the DeepMimic framework, the high level details and novel contribution claims are summarized here, skipping the common details about deep RL problem formulations. ...

Interactive Reconstruction of Monte Carlo Image Sequences using a Recurrent Denoising Autoencoder

Owing to the immense popularity of ray-tracing and path tracing rendering algorithms for visual effects, there has been a surge of interest in developing filtering and reconstruction methods to deal with the noise present in these Monte Carlo renderings. Despite the focus on a large sampling rate (upto thousands of samples per pixel before filtering), even the fastest ray tracers are limited to a few rays per pixel, and a low sampling budget would be realistic for the foreseeable future. This paper proposes a learning-based approach for reconstruction of global illumination with very low sampling budgets (as low as 1 spp) at interactive rates. At 1 sample per pixel (spp), the Monte Carlo integration of indirect illumination results in very noisy images, and the problem can therefore be framed as reconstruction instead of denoising. Previous works on offline and interactive denoising for Monte Carlo rendering suffer from a trade-off between speed and performance, require user-defined parameters, and scale poorly to large scenes. Inspired by the progress in single image restoration (denoising) using deep learning, the authors propose a deep learning based approach which leverages an encoder-decoder architecture and recurrent connections for improved temporal consistency. The proposed model requires no user guidance, is end-to-end trainable and is able to exploit auxiliary pixel features for improved performance. ...

Mesh R-CNN

Although deep learning has enabled massive strides in visual recognition tasks including object detection, most of these advances have been made in 2D object recognition. However, these improvements are built upon a critical omission: objects in the real world exist beyond the $XY$ image plane and in a 3D space. While there has also been significant progress in 3D shape understanding tasks, the authors call to attention for methods that amalgamate these two tasks: i.e., approaches which (a) can work in the real world where there are far fewer constraints (as compared to carefully curated datasets) such as constraints on object count, occlusion, illumination, etc., and (b) can do so without ignoring the rich 3D information present therein. They build upon the immensely popular Mask R-CNN multi-task framework and extend it by adding a mesh prediction branch that learns to generate “high-resolution triangle mesh” of the detected objects simultaneously. Whereas previous works on single-view shape prediction rely on post-processing or are limited in the topologies that they can represent as meshes, Mesh R-CNN uses multiple 3D shape representations: 3D voxels and 3D meshes, where the latter is obtained by refining the former. ...