Articulated 3D Human-Object Interactions from RGB Videos:
An Empirical Analysis of Approaches and Challenges [3DV 2022]


Sanjay Haresh, Xiaohao Sun, Hanxiao Jiang, Angel X. Chang, Manolis Savva

Code Paper

Abstract

Human-object interactions with articulated objects are common in everyday life. Despite much progress in single-view 3D reconstruction, it is still challenging to infer an articulated 3D object model from an RGB video showing a person manipulating the object. We canonicalize the task of articulated 3D human-object interaction reconstruction from RGB video, and carry out a systematic benchmark of four methods for this task: 3D plane estimation, 3D cuboid estimation, CAD model fitting, and free-form mesh fitting. Our experiments show that all methods struggle to obtain high accuracy results even when provided ground truth information about the observed objects. We identify key factors which make the task challenging and suggest directions for future work on this challenging 3D computer vision task.

Video


Paper


Bibtex


@inproceedings{haresh2022articulated, title={Articulated 3D Human-Object Interactions from RGB Videos: An Empirical Analysis of Approaches and Challenges}, author={Haresh, Sanjay and Sun, Xiaohao and Jiang, Hanxiao and Chang, Angel X and Savva, Manolis}, booktitle={2022 International Conference on 3D Vision (3DV)}, year={2022}, organization={IEEE} }

Acknowledgements


This work was funded in part by a Canada CIFAR AI Chair, a Canada Research Chair and NSERC Discovery Grant, and enabled in part by support provided by WestGrid and Compute Canada. We thank Xiang Xu, Shengyi Qian and Zhenyu Jiang for answering questions about D3DHOI, 3DADN and Ditto respectively.