LegoOcc: Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes
CVPR 2026 Oral
Changqing Zhou, Yueru Luo, Han Zhang, Zeyu Jiang, Changhao Chen
Overview
LegoOcc studies open-vocabulary 3D occupancy prediction for indoor scenes from monocular input. The goal is to recover both geometry and fine-grained semantics in cluttered environments where category sets are open-ended and much richer than fixed-label occupancy benchmarks.
Instead of relying on dense semantic supervision, LegoOcc follows a geometry-only supervision setting with binary occupancy labels. The method builds on 3D language-embedded Gaussians so that geometry and open-vocabulary semantics can be represented in a unified 3D structure.

Method
LegoOcc focuses on two core issues in indoor open-vocabulary occupancy prediction:
- Existing Gaussian-to-occupancy operators are unstable under weak geometry-only supervision.
- Directly aligning rendered Gaussian features with open-vocabulary segmentation features causes feature mixing and weak semantic grounding.
To address these issues, LegoOcc introduces an opacity-aware Poisson-based aggregation strategy for more stable volumetric occupancy estimation. It also uses a Progressive Temperature Decay schedule during splatting, gradually sharpening opacity and improving the alignment between Gaussian primitives and language-aware visual features.
Performance

Additional Visualizations
