HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction

Please check out the HOI4D Challenge on the latest project website www.hoi4d.top !

           Yunze Liu*1,3            Yun Liu*1            Che Jiang1            Kangbo Lyu1            Weikang Wan2            Hao Shen2            Boqiang Liang2            Zhoujie Fu1            He Wang2            Li Yi†1,3
1Tsinghua University, 2Peking University, 3Shanghai Qi Zhi Institute
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

Abstract

We present HOI4D, a large-scale 4D egocentric dataset with rich annotations, to catalyze the research of category-level human-object interaction. HOI4D consists of 2.4M RGB-D egocentric video frames over 4000 sequences collected by 9 participants interacting with 800 different object instances from 16 categories over 610 different indoor rooms. Frame-wise annotations for panoptic segmentation, motion segmentation, 3D hand pose, category-level object pose and hand action have also been provided, together with reconstructed object meshes and scene point clouds. With HOI4D, we establish three benchmarking tasks to promote category-level HOI from 4D visual signals including semantic segmentation of 4D dynamic point cloud sequences, category-level object pose tracking, and egocentric action segmentation with diverse interaction targets. In-depth analysis shows HOI4D poses great challenges to existing methods and produces great research opportunities.

Benchmark

4D Semantic Segmentation

Method mIoU Year
P4Transformer 40.1 2022
PPTr 41.0 2022

4D Action Segmentation

Method Accuracy Year
PPTr (without position embedding) 71.75 2022
PPTr (with position embedding) 77.40 2022

You can benchmark your method by sending test results to the email: liuyzchina@gmail.com

And the official leaderboard is coming soon.

Overview of HOI4D

We construct a large-scale 4D egocentric dataset with rich annotation for category-level human-object interaction. Frame-wise annotations for action segmentation(a), motion segmentation(b), panoptic segmentation(d), 3D hand pose and category-level object pose(c) are provided, together with reconstructed object meshes(e) and scene point cloud.

Object categories

Tasks and Benchmarks

Category-Level Object and Part Pose Tracking

4D Point Cloud Videos Semantic Segmentation

Fine-grained Video Action Segmentation

Paper


Citing HOI4D

Please cite HOI4D if it helps your research:


          @InProceedings{Liu_2022_CVPR,
    author    = {Liu, Yunze and Liu, Yun and Jiang, Che and Lyu, Kangbo and Wan, Weikang and Shen, Hao and Liang, Boqiang and Fu, Zhoujie and Wang, He and Yi, Li},
    title     = {HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {21013-21022}
}

Data

HOI4D_color. RGB_video.

HOI4D_depth. Depth_video.

HOI4D_CAD_models. CAD_model.

HOI4D_annotations. Annotations.

HOI4D_cameras. Camera_parameters.

HOI4D_Handpose. HOI4D_Handpose_1.

HOI4D_Instructions. Github.

Baidu Cloud download link. Baidu Cloud download link..

Contact

Send any comments or questions to Yunze Liu or Yun Liu: liuyzchina@gmail.com, yun-liu22@mails.tsinghua.edu.cn. HOI4D is licensed under CC BY-NC 4.0.


Last updated on 2022/08/10