CAGE: Controllable Articulation GEneration

Simon Fraser University
CVPR 2024

We address the challenge of generating 3D articulated objects in a controllable fashion. Currently, modeling articulated 3D objects is either achieved through laborious manual authoring, or using methods from prior work that are hard to scale and control directly.

We leverage the interplay between part shape, connectivity, and motion using a denoising diffusion-based method with attention modules designed to extract correlations between part attributes. Our method takes an object category label and a part connectivity graph as input and generates an object's geometry and motion parameters. The generated objects conform to user-specified constraints on the object category, part shape, and part articulation.

Our experiments show that our method outperforms the state-of-the-art in articulated object generation, producing more realistic objects while conforming better to user constraints.

Method Overview

We represent each object as a collection of part attributes. In the forward pass, Gaussian noise is iteratively added to corrupt the data from x0 to random noise xT . During the reverse process, our denoiser (in yellow highlight) predicts the residual noise to be subtracted from the input data xt at timestep t conditioned on the category label c and a graph adjacency G as an attention mask injected in the Graph Relation Attention module. All the timesteps share the same denoiser that is built on layers of our Attribute Attention Blocks (AAB).

Attention Modules Design in AAB

Design of the attention modules within each attribute attention block (AAB). Each node in the input graph is represented by 5 attributes describing part shape and motion parameters. Each node attribute is projected to a separate token and sequentially passed to three attention modules with varied masking strategies. White cells signify activated attention positions, whereas grey cells indicate attention that has been masked out.


Graph Conditional Generation


Here we show qualitative examples of generated objects conditioned on graphs of varied complexity. Our generated objects respect the node hierarchy with parts better matching the input graph as denoted by the color. In comparison, prior work (NAP) fails to conform to the input graph with inconsistent part assignments and flipped or disordered connections between parts, denoted using red arrows.

OOD Graph Conditional Generation

Our method can be conditioned on graphs that are out of the distribution of the training samples. These results illustrate that our method can generate plausible objects even for unseen graphs with different numbers and compositions of parts.

Part → Motion

Our method can be conditioned on specific attributes associated with the graph. Here, we predict part articulation parameters given a bounding box condition for each part. The output plausibly completes the object given the provided input parts.

Type → Part

We show various possible completions by conditioning on specific joint types for each node in the input graph.

Axis → Part

We show various possible completions conditioning on a specific articulation axis for each part.


          author        = {Liu, Jiayi and Tam, Hou In Ivan and Mahdavi-Amiri, Ali and Savva, Manolis},
          title         = {{CAGE: Controllable Articulation GEneration}},
          year          = {2023},
          eprint        = {2312.09570},
          archivePrefix = {arXiv}