GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding

CVPR 2024, Seattle

✨ Highlight ✨

1Brain and Artificial Intelligence Lab, Northwestern Polytechnical University 2MBZUAI 3Hefei University of Technology 4Nanyang Technological Universityh 5Baidu, Inc. 6Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

GP-NeRF achieves remarkable performance improvements for instance and semantic segmentation in both synthesis and real-world datasets


Method

Overview of proposed GP-NeRF. Given reference views with their poses, we embed NeRF into the segmenter to perform context-aware semantic $Y_{sem}$ /instance $Y_{ins}$ segmentation and ray reconstruction $Y_{rgb}$ in novel view (Sec. 4.1). In detail, we use Transformers to co-aggregate Radiance as well as Semantic-Embedding fields and render them jointly in novel views (Sec. 4.2). Specifically, we propose two self-distillation mechanisms to boost the discrimination and quality of the semantic embedding field (Sec. 4.3).

Training and Rendering

Illustration of training(a) and rendering(b) procedure, where S.E. field denotes Semantic-Embedding Field.

Loss Functions

2D Semantic Distillation $\mathcal{L}_{\text {S.D}}$ and Depth-Guided Semantic Optimization $\mathcal{L}_{\text {D.G}}$. This figure demonstrates a single raw of our semantic-embedding field. the network "cheat" by rendering all points $\boldsymbol{f}^{sem}_i$ to the same prediction to satisfy $\mathcal{L}_{\text {S.D}}$ supervision. By performing spatial-wise semantic supervision, $\mathcal{L}_{\text {S.D}}$ is able to mitigate the issue of "cheating".


Benchmarking

Quantitative Comparison with other SOTA methods for generalized and fine-tuning semantic segmentation.
Semantic quality comparison in Replica. On the left, we show the rendering results of S-Ray [17] and GP-NeRF(ours) in generalized and finetuning settings. On the right, we visualize the PCA results of our rendered semantic features in novel views.

BibTeX

@article{li2024gp,
  title={GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding},
  author={Li, Hao and Zhang, Dingwen and Dai, Yalun and Liu, Nian and Cheng, Lechao and Li, Jingfeng and Wang, Jingdong and Han, Junwei},
  journal={CVPR},
  year={2024}
}