LegoOcc: Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes

CVPR 2026 Oral

Changqing Zhou, Yueru Luo, Han Zhang, Zeyu Jiang, Changhao Chen

Overview

LegoOcc studies open-vocabulary 3D occupancy prediction for indoor scenes from monocular input. The goal is to recover both geometry and fine-grained semantics in cluttered environments where category sets are open-ended and much richer than fixed-label occupancy benchmarks.

Instead of relying on dense semantic supervision, LegoOcc follows a geometry-only supervision setting with binary occupancy labels. The method builds on 3D language-embedded Gaussians so that geometry and open-vocabulary semantics can be represented in a unified 3D structure.

LegoOcc framework

Method

LegoOcc focuses on two core issues in indoor open-vocabulary occupancy prediction:

Existing Gaussian-to-occupancy operators are unstable under weak geometry-only supervision.
Directly aligning rendered Gaussian features with open-vocabulary segmentation features causes feature mixing and weak semantic grounding.

To address these issues, LegoOcc introduces an opacity-aware Poisson-based aggregation strategy for more stable volumetric occupancy estimation. It also uses a Progressive Temperature Decay schedule during splatting, gradually sharpening opacity and improving the alignment between Gaussian primitives and language-aware visual features.

Performance

LegoOcc results

Additional Visualizations

LegoOcc Vis1 LegoOcc Vis2