GPOcc: Generalizing Visual Geometry Priors to Sparse Gaussian Occupancy Prediction

CVPR 2026

Changqing Zhou, Yueru Luo, Changhao Chen

Overview

GPOcc explores how strong visual geometry priors can be turned into more effective occupancy prediction for embodied 3D scene understanding. Rather than stopping at visible surfaces, GPOcc aims to infer volumetric structure and free space from monocular observations with better accuracy, efficiency, and generalization.

The method is motivated by the fact that recent geometry foundation models provide strong 3D cues, but these cues mainly describe surfaces. Occupancy prediction, however, requires reasoning about the interior volume of a scene, not just what is directly visible.

GPOcc teaser

Method

GPOcc leverages generalizable visual geometry priors and extends surface points inward along camera rays to generate volumetric samples. These samples are represented as Gaussian primitives, which support probabilistic occupancy inference in a sparse yet expressive 3D formulation.

To support streaming monocular input, GPOcc further introduces a training-free incremental update strategy that fuses frame-wise Gaussians into a unified global representation. This design makes the method suitable for both single-frame and streaming settings while keeping inference efficient.

Performance

GPOcc teaser

Additional Visualizations

GPOcc Vis1 GPOcc Vis2

Videos

Demo 1

Demo 2

Demo 3

Demo 4

Demo 5

Demo 6

Demo 7

Demo 8