Full-Body Awareness from Partial Observations

Chris Rockwell

David F. Fouhey

University of Michigan

ECCV, 2020

[GitHub]

[Paper]

[Demo]

[Annotated Test Set]

Fig. 1: We present a simple but highly effective framework for adapting human pose estimation methods to highly truncated settings that requires no additional pose annotation. We evaluate the approach on HMR and CMR by annotating four Internet video test sets: VLOG (top-left, top-middle), Cross-Task (top-right, bottom-left), YouCookII (bottom-middle), and Instructions (bottom-right).

Abstract

There has been great progress in human 3D mesh recovery and great interest in learning about the world from consumer video data. Unfortunately current methods for 3D human mesh recovery work rather poorly on consumer video data, since on the Internet, unusual camera viewpoints and aggressive truncations are the norm rather than a rarity. We study this problem and make a number of contributions to address it: (i) we propose a simple but highly effective self-training framework that adapts human 3D mesh recovery systems to consumer videos and demonstrate its application to two recent systems; (ii) we introduce evaluation protocols and keypoint annotations for 13K frames across four consumer video datasets for studying this task, including evaluations on out-of-image keypoints; and (iii) we show that our method substantially improves PCK and human-subject judgments compared to baselines, both on test videos from the dataset it was trained on, as well as on three other datasets without further adaptation.

Paper and Supplemental Material

Rockwell and Fouhey.
Full-Body Awareness from Partial Observations.
In ECCV 2020.
(Hosted on arXiv)

[Paper] [Supplemental] [Code]

                    @inProceedings{Rockwell2020,
                      author = {Chris Rockwell and David F. Fouhey},
                      title = {Full-Body Awareness from Partial Observations},
                      booktitle = {ECCV},
                      year = 2020
                    }

Try the Interactive Demo!

What do 100 representative images look like in Internet video?
Here is an interactive tool for exploring what typical humans look like in four Internet Video datasets

Each cell is an image from a representative 100 image sample. We have grouped the poses into five categories:

You see poses from four datasets:

VLOG Cross Task Instructions YouCookII

100 Images

Video Results

We show results of our HMR model on videos across datasets, and show some common failure cases. Model outputs are used as-is, with SMPL parameters smoothed across frames.

ECCV Talk

ECCV Short Overview

Annotated Test Set

Frames we annotate for testing are from four internet video datasets collected in prior work: VLOG, Cross-Task, YouCookII, and Instructions. We do not hold the copyright to these videos, but for ease of replication, we are making available our local copy of the data for non-commercial research purposes only. Click here to download our copies of VLOG, Cross-Task, Instructions. Fill out this Google Form so we can share the download of YouCookII. If you find the annotations helpful, please additionally cite the original works.

Recent & Concurrent Work

There has been a variety of exciting recent and concurrent work on in-the-wild 3D human mesh estimation. In addition to HMR and CMR, here is a partial list:

Donglai Xiang, Hanbyul Joo, Yaser Sheikh. Monocular total capture: Posing face, body, and hands in the wild [PDF]
Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, and Kostas Daniilidis. Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop [PDF]
Hanbyul Joo, Natalia Neverova, and Andrea Vedaldi. Exemplar Fine-Tuning for 3D Human Pose Fitting Towards In-the-Wild 3D Human Pose Estimation [PDF]
Wen Jiang, Nikos Kolotouros, Georgios Pavlakos, Xiaowei Zhou, and Kostas Daniilidis. Coherent Reconstruction of Multiple Humans From a Single Image. [PDF]
Jason Y. Zhang, Sam Pepose, Hanbyul Joo, Deva Ramanan, Jitendra Malik, and Angjoo Kanazawa. Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild. [PDF]
Vasileios Choutas, Georgios Pavlakos, Timo Bolkart, Dimitrios Tzionas, and Michael J. Black. Monocular Expressive Body Regression through Body-Driven Attention [PDF]

In addition to Human 3.6M, the 3D Poses In the Wild Dataset has been adopted for evaluation. We see our method and annotated test set as complimentary to other recent works. Beyond HMR and CMR, we would be excited to see other models (including optimization) adapted to and evaluated upon the extreme truncation of our setting.

Acknowledgements

This work was supported by the DARPA Machine Common Sense Program. We thank Dimitri Zhukov, Jean-baptiste Alayrac, and Luowei Zhou, for allowing us to share privately frames from their respective datasets: Cross-Task, Instructions, and YouCookII. Thanks to Angjoo Kanazawa and Nikos Kolotouros for polished model repositories to easily extend their respective HMR and CMR models. Thanks also to the members of Fouhey AI Lab and Karan Desai for all of the great suggestions! Finally, thanks to Alejandro Newell and Jia Deng for teaching me so much about human pose estimation. The webpage template originally came from some colorful folks.