DiffComplete: Diffusion-based Generative 3D Shape Completion

Abstract

We introduce a new diffusion-based approach for shape completion on 3D range scans. Compared with prior deterministic and probabilistic methods, we strike a balance between realism, multi-modality, and high fidelity. We propose DiffComplete by casting shape completion as a generative task conditioned on the incomplete shape. Our key designs are two-fold. First, we devise a hierarchical feature aggregation mechanism to inject conditional features in a spatially-consistent manner. So, we can capture both local details and broader contexts of the conditional inputs to control the shape completion. Second, we propose an occupancy-aware fusion strategy in our model to enable the completion of multiple partial shapes and introduce higher flexibility on the input conditions. DiffComplete sets a new SOTA performance (e.g., 40% decrease on l_1 error) on two large-scale 3D shape completion benchmarks. Our completed shapes not only have a realistic outlook compared with the deterministic methods but also exhibit high similarity to the ground truths compared with the probabilistic alternatives. Further, DiffComplete has strong generalizability on objects of entirely unseen classes for both synthetic and real data, eliminating the need for model re-training in various applications.

Complete Objects of Diverse Known Categories

Complete Objects of Entirely Unseen Categories

Synthetic Objects

Real Objects

Multimodal Completion Results

Given the same input (left), DiffComplete is able to produce multiple plausible completion results (right).

Multiple Conditional Inputs

When we gradually introduce more partial shapes of the same object, DiffComplete can incorporate the local structures of all partial inputs to improve the completion accuracy.

Method

Figure 1. An overview of DiffComplete framework. Given a corrupted complete shape x_t (diffused from x_0) and an incomplete scan c, we first process them into ε_x(x_t) and ε_c(c) to align the distributions. We employ a main branch to forward ε_x(x_t), and a control branch to propagate their fused features f into deep layers. Multi-level features of f are aggregated into the main branch for hierarchical control in predicting the diffusion noise. To support multiple partial scans as condition, e.g., two scans {c_1,c_2}, we switch on occupancy-aware fusion. This strategy utilizes the occupancy masks to enable a weighted feature fusion for c_1 and c_2 by considering their geometry reliability before feeding them into the main branch.

Experiments

Comparisons with SOTA methods on ShapeNet objects of known categories. DiffComplete improves over state of the arts by 40% in l1-error (0.053 v.s. 0.088).

Comparisons with SOTA methods on ShapeNet objects of entirely unseen categories. ·/· means CD/IoU. DiffComplete exhibits the best completion quality on average for eight unseen object categories, despite lacking zero-shot designs.

Comparisons with SOTA methods on real-world ScanNet objects of entirely unseen categories. DiffComplete also achieves the best completion quality with real-world scans, which are often cluttered and noisy.

Shape Completion on various known object classes. DiffComplete produces much more realistic and high-fidelity object shapes than previous methods.

Shape completion on synthetic (blue) and real (yellow) objects of entirely unseen classes. The 3D shapes produced by DiffComplete stand out for their impressive global coherence and local details.

BibTeX

@article{chu2024diffcomplete,
      title={Diffcomplete: Diffusion-based generative 3d shape completion},
      author={Chu, Ruihang and Xie, Enze and Mo, Shentong and Li, Zhenguo and Nie{\ss}ner, Matthias and Fu, Chi-Wing and Jia, Jiaya},
      journal={Advances in Neural Information Processing Systems},
      year={2023}
    }