Habitat Synthetic Scenes Dataset (HSSD):
An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation

1Georgia Tech, 2Simon Fraser University, 3Stanford University, 4FAIR, Meta AI
*Co-first authors

Abstract

We contribute the Habitat Synthetic Scene Dataset, a dataset of 211 high-quality 3D scenes, and use it to test navigation agent generalization to realistic 3D environments. Our dataset represents real interiors and contains a diverse set of 18,656 models of real-world objects.

We investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects (ObjectGoal navigation). By comparing to synthetic 3D scene datasets from prior work, we find that scale helps in generalization, but the benefits quickly saturate, making visual fidelity and correlation to real-world scenes more important. Our experiments show that agents trained on our smaller-scale dataset can match or outperform agents trained on much larger datasets.

Surprisingly, we observe that agents trained on just 122 scenes from our dataset outperform agents trained on 10,000 scenes from the ProcTHOR-10K dataset in terms of zero-shot generalization in real-world scanned environments.

Scene visuals

Scene visuals of HSSD.

Object diversity

Object diversity of HSSD.

ObjectNav results

Our experiments show that the benefit of scene dataset scale saturates quickly, and scene realism and quality become the bottleneck for improved ObjectNav agent generalization to realistic scenes.


Concretely, we find that agents trained on 122 scenes from HSSD outperform agents trained on two orders of magnitude more scenes from the ProcTHOR dataset.


Object diversity of HSSD.

Zero-shot ObjectNav performance on HM3DSem and MP3D: for agents pretained on synthetic 3D scene datasets of different scale and quality.

BibTeX


  @article{khanna2023hssd,
      author={{Khanna*}, Mukul and {Mao*}, Yongsen and Jiang, Hanxiao and Haresh, Sanjay and Shacklett, Brennan and Batra, Dhruv and Clegg, Alexander and Undersander, Eric and Chang, Angel X. and Savva, Manolis},
      title={{Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation}},
      journal={arXiv preprint},
      year={2023},
      eprint={2306.11290},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
  }
  

Acknowledgements

The research team members at SFU were supported by a Canada CIFAR AI Chair grant, a Canada Research Chair grant, an NSERC Discovery Grant and a research grant by Meta AI. Experiments at SFU were enabled by support from the Digital Research Alliance of Canada.

We thank Karmesh Yadav, Chris Paxton, Mrinal Kalakrishnan, Sonia Raychaudhuri, Qirui Wu, and Xiaohao Sun for useful discussions and feedback on early drafts of this paper. We also thank Ram Ramrakhya for useful discussions and help with the ProcTHOR experiments, and John Turner and Vladimír Vondruš for help with 3D asset compression.