Survey on Compositional 3D Indoor Scene Generation

1Simon Fraser University, 2Alberta Machine Intelligence Institute (Amii)

Abstract

Compositional 3D indoor scene generation is a long-standing problem and a rapidly evolving area of research spanning computer graphics, 3D computer vision, and machine learning. The goal is to model the complex relationships among objects and their spatial and functional arrangements within a scene, enabling the creation of rich, diverse, and useful 3D environments for a wide range of applications. This survey offers a comprehensive overview of the state of the art, formulating a unifying framework for analyzing scene generation systems and systematically categorizing existing methods according to their approaches to key components. We review recent progress, analyze the strengths and limitations of different paradigms, and highlight both major advances and open challenges. Our survey aims to serve as a resource for researchers and practitioners, offering insights into the current landscape and inspiring new ideas for future work in this area.

Scene Generation System Blueprint

Overview of the key components of a 3D scene generation method. Given an input condition, and prior knowledge about how objects are arranged, compositional systems typically first generate a coarse layout of the scene, determine and refine object placements, obtain corresponding objects, and combine them with architectural elements to produce the output 3D scene. As part of this process, an important design choice is the scene representation.

Scene Generation System Blueprint

Implementation Choices

Blueprint illustrating components of compositional 3D scene generation systems, and implementation choices made by prior work for each component. Parentheses show which section discusses design choices for each component.

Implementation Choices for Blueprint

Paper List

This interactive table catalogs papers on compositional 3D indoor scene generation. You can sort and filter papers using the controls at each column header.

Column Descriptions:

  • Input: The type of conditioning the method takes.
  • Representation: How the method structures the scene information.
  • Knowledge: The data source leveraged for learning scene priors.
  • Layout: How the layout is specified.
  • Layout Extra: Additional details about the layout generation approach.
  • Placement: The approach used for placing objects in the scene.
  • Object: How the object shapes are obtained.
  • Retrieval Details: The approach used to retrieve suitable objects from the object shape database.

Empty cells in Object and Retrieval Details indicate that the component is not applicable or insufficiently described in the original paper.

Spot an issue? Please let us know by opening an issue on our GitHub repository.

Loading papers...

BibTeX

@article{tam2025survey,
    title = {Survey on Compositional {3D} Indoor Scene Generation},
    author = {Tam, Hou In Ivan and Pun, Hou In Derek and Wang, Austin T. and Sun, Xiaohao and Wu, Qirui and Lee, Han-Hung and Chang, Angel X. and Savva, Manolis},
    year = {2025}
}

Acknowledgements

This work was funded in part by the Sony Research Award Program, a CIFAR AI Chair, a Canada Research Chair, NSERC Discovery Grants, and enabled by support from the Digital Research Alliance of Canada.