Full-Body Awareness from Partial Observations
Chris Rockwell
David F. Fouhey
University of Michigan
ECCV, 2020
[GitHub]
[Paper]
[Demo]
[Annotated Test Set]


Fig. 1: We present a simple but highly effective framework for adapting human pose estimation methods to highly truncated settings that requires no additional pose annotation. We evaluate the approach on HMR and CMR by annotating four Internet video test sets: VLOG (top-left, top-middle), Cross-Task (top-right, bottom-left), YouCookII (bottom-middle), and Instructions (bottom-right).



Abstract

There has been great progress in human 3D mesh recovery and great interest in learning about the world from consumer video data. Unfortunately current methods for 3D human mesh recovery work rather poorly on consumer video data, since on the Internet, unusual camera viewpoints and aggressive truncations are the norm rather than a rarity. We study this problem and make a number of contributions to address it: (i) we propose a simple but highly effective self-training framework that adapts human 3D mesh recovery systems to consumer videos and demonstrate its application to two recent systems; (ii) we introduce evaluation protocols and keypoint annotations for 13K frames across four consumer video datasets for studying this task, including evaluations on out-of-image keypoints; and (iii) we show that our method substantially improves PCK and human-subject judgments compared to baselines, both on test videos from the dataset it was trained on, as well as on three other datasets without further adaptation.


Paper and Supplemental Material

Rockwell and Fouhey.
Full-Body Awareness from Partial Observations.
In ECCV 2020.
(Hosted on arXiv)


[Paper] [Supplemental] [Code]
                    @inProceedings{Rockwell2020,
                      author = {Chris Rockwell and David F. Fouhey},
                      title = {Full-Body Awareness from Partial Observations},
                      booktitle = {ECCV},
                      year = 2020
                    } 


Try the Interactive Demo!

What do 100 representative images look like in Internet video?
Here is an interactive tool for exploring what typical humans look like in four Internet Video datasets



Each cell is an image from a representative 100 image sample. We have grouped the poses into five categories:


You see poses from four datasets:

VLOG Cross Task Instructions YouCookII

100 Images
                                                           
                                                           
                                                           
                                                           
                                                           
                                                           
                                                           
                                                           
                                                           
                                                           

Video Results

We show results of our HMR model on videos across datasets, and show some common failure cases. Model outputs are used as-is, with SMPL parameters smoothed across frames.


ECCV Talk


ECCV Short Overview




Annotated Test Set

Frames we annotate for testing are from four internet video datasets collected in prior work: VLOG, Cross-Task, YouCookII, and Instructions. We do not hold the copyright to these videos, but for ease of replication, we are making available our local copy of the data for non-commercial research purposes only. Click here to download our copies of VLOG, Cross-Task, Instructions. Fill out this Google Form so we can share the download of YouCookII. If you find the annotations helpful, please additionally cite the original works.


Recent & Concurrent Work

There has been a variety of exciting recent and concurrent work on in-the-wild 3D human mesh estimation. In addition to HMR and CMR, here is a partial list: In addition to Human 3.6M, the 3D Poses In the Wild Dataset has been adopted for evaluation. We see our method and annotated test set as complimentary to other recent works. Beyond HMR and CMR, we would be excited to see other models (including optimization) adapted to and evaluated upon the extreme truncation of our setting.



Acknowledgements

This work was supported by the DARPA Machine Common Sense Program. We thank Dimitri Zhukov, Jean-baptiste Alayrac, and Luowei Zhou, for allowing us to share privately frames from their respective datasets: Cross-Task, Instructions, and YouCookII. Thanks to Angjoo Kanazawa and Nikos Kolotouros for polished model repositories to easily extend their respective HMR and CMR models. Thanks also to the members of Fouhey AI Lab and Karan Desai for all of the great suggestions! Finally, thanks to Alejandro Newell and Jia Deng for teaching me so much about human pose estimation. The webpage template originally came from some colorful folks.