Benchmark Dataset

Scene Level

Object Instance Segmentation

Object Instance Segmentation takes {scan_id}.pth as inputs in the training and evaluation phase.

Input

{scan_id}.pth
xyz (numpy array: float32):

(N,3) array of xyz coordinates of the scene point cloud. The coords are zero centered, such that the centroid of the point cloud is shifted to the zero origin.

rgb (numpy array: float32):

(N,3) array of vertex colors of the scene point cloud. With color in range [0, 255].

normal (numpy array: float32):

(N,3) array of vertex normals of the scene point cloud.

sem_labels (numpy array: int32):

Object semantic IDs of the points in the scene point cloud.

instance_ids (numpy array: int32):

Object instance IDs of the points in the scene point cloud.

Object Level

Part Instance Segmentation and Mobility Estimation both take articulated objects as inputs. To ensure the consistency of data used in Part Instance Segmentation Benchmark and Mobility Estimation Benchmark, we first extract the articulated objects in the MultiScan Dataset to a single HDF5 file, then have scripts convert this HDF5 file to various formats required by the methods.

Articulated Objects HDF5

HDF5 Groups

Each group in the HDF5 file stores the data of 1 articulated object. The group name is formatted as group_name = {scan_id}_{object_id}. Where scan_id is defined here, object_id is the object ID in the annotations.json.

HDF5 Attributes

num_parts (int):

Number of movable parts in the articulated object.

has_additional_vertices (bool):

A boolean signal indicates whether the extracted object has number of points greater than threshold 4096. If the object has fewer 4096 points, additional point up-sampling is applied to ensure each articulated object has least 4096 points.

HDF5 Datasets

pts (numpy array: float32):

(N,9) array of points of the object point cloud with xyz, rgb, normals. The object is aligned by fitting the annotated OBB to the common coordinate frame, with y axis is the up direction.

faces (numpy array: float32):

(N,3) array of the faces of the object triangle mesh.

part_semantic_masks (numpy array: int32):

(N,1) array, values are the part semantic IDs.

part_instance_masks (numpy array: int32):

(N,1) array of point class in range [0, K] of N points. Value 0 represents the static part, value 1~K represents the K movable parts.

motion_types (numpy array: float32):

(K,1) array of the joint types corresponding to the K movable parts. Rotation: 0, translation: 1.

motion_origins (numpy array: float32):

(K,3) array, positions of the K joint origins.

motion_axes (numpy array: float32):

(K,3) array, directions of the K joint axes.

motion_ranges (numpy array: float32):

(K,2) array, The ranges of the motion that the movable part can reach relative to the current state. The unit is radians for the rotation type motion, The unit is m (meter) for the translation type motion.

motion_states (numpy array: float32):

(K,1) array, The current states of the K movable part.

part_closed (numpy array: bool):

(K,1) array, indicates whether part k is closed.

transformation_back (numpy array: float32):

(16,1) column major transformation matrix transform the object back to origin pose in the scene before the canonical alignment.

additional2original_vertex_match (numpy array: int32):

Mapping from additionally up-sampled points to the index of the vertex in the originally extracted object.

Part Instance Segmentation

Input

key (string):

same as the HDF5 Groups.

object_name (string):

same as the object_name.

xyz (numpy array: float32):

(N,3) array of xyz coordinates of the object point cloud.

rgb (numpy array: float32):

(N,3) array of vertex colors of the object point cloud. With RGB color in range [0, 255].

normal (numpy array: float32):

(N,3) array of vertex normals of the object point cloud.

faces (numpy array: float32):

same as the faces.

sem_labels (numpy array: float32):

same as part_semantic_masks.

instance_ids (numpy array: float32):

same as part_instance_masks.

part_closed (numpy array: bool):

same as part_closed.

transformation_back (numpy array: float32):

same as transformation_back.

additional2original_vertex_match (numpy array: float32):

same as additional2original_vertex_match.

Mobility Estimation

Shape2Motion

HDF5 Groups

Each group in the HDF5 file stores the data of 1 articulated object.

HDF5 Attributes

num_parts (int):

same as the num_parts.

fps_sample (bool):

Objects with more than 4096 points are down-sampled to 4096 points with FPS(Farthest Point Sampling) method. This boolean attribute indicates whether the object is down-sampled or not.

object_name (string):

same as the object_name.

HDF5 Datasets

input_pts (numpy array: float32):

same as the pts.

anchor_pts (numpy array: float32):

(N, 1) array of the mask of the selected points in the object point cloud that is a set of closest points to the joint origin. 1 indicates the point is anchor point, 0 otherwise.

joint_direction_cat (numpy array: float32):

(N, 1) array of the indices of the joint direction categories.

joint_direction_reg (numpy array: float32):

(N, 3) array of joint direction from 14 joint categories to the actual joint axis direction.

joint_origin_reg (numpy array: float32):

(N, 3) array of joint origin regression from the anchor points to the actual joint origin.

joint_type (numpy array: float32):

same as the motion_types.

joint_all_directions (numpy array: float32):

(14, 3) array represents the 14 joint axis categories.

gt_joints (numpy array: float32):

(K, 7) array represents the motions of the articulated object. Each row is composed with [joint_origin, joint_axis, joint_type].

gt_proposals (numpy array: float32):

(K+1, N) array represents the part segmentations of the object. K is the number of movable parts, N is the number of points.

simmat (numpy array: float32):

A 4096x4096 similarity matrix, where each row represents a part segmentation. Where ith row encodes the part mask that contains the ith point in the object point cloud.

point_idx (numpy array: float32):

The indices of the selected points during FPS down-sample in the original object point cloud.