The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs
Chris Rockwell
Justin Johnson
David F. Fouhey
University of Michigan
3DV 2022
[GitHub]
[arXiv]
[BibTex]




Figure 1. We propose three small modifications to a ViT via the Essential Matrix Module, enabling computations similar to the Eight-Point algorithm. The resulting mix of visual and positional features is a good inductive bias for pose estimation.



Abstract


We present a simple baseline for directly estimating the relative pose (rotation and translation, including scale) between two images. Deep methods have recently shown strong progress but often require complex or multi-stage architectures. We show that a handful of modifications can be applied to a Vision Transformer (ViT) to bring its computations close to the Eight-Point Algorithm. This inductive bias enables a simple method to be competitive in multiple settings, often substantially improving over the state of the art with strong performance gains in limited data regimes.



Approach



Figure 2. Essential Matrix Module. We make three small changes to standard ViT Cross-Attention: (1) appending positional encodings to Values, (2) applying a dual softmax on Affinities, and (3) applying bilinear attention.



Paper and Supplemental Material


Rockwell, Johnson and Fouhey.
The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs.
In 3DV 2022. (Hosted on arXiv)


[Paper+Supplemental] [Code]
            @inProceedings{Rockwell2022,
              author = {Chris Rockwell and Justin Johnson and David F. Fouhey},
              title = {The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs},
              booktitle = {3DV},
              year = 2022
            }
          

Acknowledgements


Thanks to Linyi Jin, Ruojin Cai and Zach Teed for help replicating and building upon their works. Thanks to Mohamed El Banani, Karan Desai and Nilesh Kulkarni for their many helpful suggestions. Thanks to Laura Fink and UM DCO for their tireless support with computing! The webpage template originally came from some colorful folks.