Text-to-3D Shape Generation Paper List

Eurographics STAR 2024

Han-Hung Lee1, Manolis Savva1 and Angel Xuan Chang1,2

1 Simon Fraser University 2 Canada-CIFAR AI Chair, Amii

alt text


Recent years have seen an explosion of work and interest in text-to-3D shape generation. Much of the progress is driven by advances in 3D representations, large-scale pretraining and representation learning for text and image data enabling generative AI models, and differentiable rendering. Computational systems that can perform text-to-3D shape generation have captivated the popular imagination as they enable non-expert users to easily create 3D content directly from text. However, there are still many limitations and challenges remaining in this problem space. In this state-of-the-art report, we provide a survey of the underlying technology and methods enabling text-to-3D shape generation to summarize the background literature. We then derive a systematic categorization of recent work on text-to-3D shape generation based on the type of supervision data required. Finally, we discuss limitations of the existing categories of methods, and delineate promising directions for future work.

We list the commonly used datasets used to train these methods here.

The methods are divided into four families as shown in the table below, namely: 1) Paired Text to 3D (3DPT); 2) Unpaired 3D Data (3DUT); 3) Text-to-3D without 3D data (NO3D); and 4) Hybrid3D.

alt text

Finally, we include works focused on generating multi-object 3D scenes, editing of 3D shapes and evaluation of text-to-3d methods.




alt text

Paired Text to 3D (3DPT)

alt text

alt text

Autoregressive Prior

Diffusion Prior

Structure Aware

Unpaired 3D Data (3DUT)

alt text

Text-to-3D without 3D data (NO3D)

Unsupervised CLIP Guidance

alt text

Unsupervised Diffusion Guidance

alt text

Loss Formulation

3D Representation Improvements

Janus Problem Mitigation

Generative Modeling

Further Reading


3D-aware T2I

alt text

Text Conditioning

Image Conditioning

Further Reading

Multi Object Scene Generation

Compositional Generation

RGBD Fusion for Scenes


Shape Editing with CLIP

Scene Editing with Text-to-image Models




      title={Text-to-3D Shape Generation}, 
      author={Han-Hung Lee and Manolis Savva and Angel X. Chang},