3D reconstruction maps multiple 2D input images into a 3D output. Classical Structure-from-Motion pipeline requires carefully handcrafted features, with each stage optimized independently, and is sensitive to noise. We introduce an end-to-end neural network, composed of viewpoint network and 3D-R2N2, which takes in viewpoint information as input. The network is trained with multi-task learning, which jointly optimizes for softmax cross entropy loss on voxel grid, as well as root mean squared error on delta viewpoint. The viewpoint information enables the network to generalize 3D invariance knowledge beyond remembering the prior probability of 3D object shapes. Our experiment shows that, by feeding in viewpoint information, the network is able to improve performance on 3D reconstruction. Our voxel IoU metric beats the previous state-of-the-art by more than 4%.

3D Reconstruction with End-to-End Viewpoint NN