Bird’s-eye view (BEV) perception has gained significant attention because it provides a unified representation to fuse multiple view images and enables a wide range of down-stream autonomous driving tasks, such as forecasting and planning. Recent state-of-the-art models utilize projection-based methods which formulate BEV perception as query learning to bypass explicit depth estimation. While we observe promising advancements in this paradigm, they still fall short of real-world applications because of the lack of uncertainty modeling and expensive computational requirement. In this work, we introduce GaussianLSS, a novel uncertainty-aware BEV perception framework that revisits unprojection-based methods, specifically the Lift-Splat-Shoot (LSS) paradigm, and enhances them with depth un-certainty modeling. GaussianLSS represents spatial dispersion by learning a soft depth mean and computing the variance of the depth distribution, which implicitly captures object extents. We then transform the depth distribution into 3D Gaussians and rasterize them to construct uncertainty-aware BEV features. We evaluate GaussianLSS on the nuScenes dataset, achieving state-of-the-art performance compared to unprojection-based methods. In particular, it provides significant advantages in speed, running 2.5x faster, and in memory efficiency, using 0.3x less memory compared to projection-based methods, while achieving competitive performance with only a 0.4% IoU difference.
(a) Lift-Splat-Shoot paradigm faces two key challenges in depth estimation. Sparse BEV projection arises from discretized depth bins, leading to incomplete spatial coverage and reduced perception accuracy. Additionally, unstable depth distribution occurs due to softmax-based probability assignment, where small depth variations cause inconsistent BEV features, disrupting spatial coherence. (b) GaussianLSS introduces uncertainty-aware depth modeling, using a continuous depth representation to reduce sparsity and improve feature consistency. Instead of relying on discrete bins, we compute the depth mean \( \mu \) and uncertainty \( \sigma \) from the predicted depth distribution, replacing softmax weighting with an uncertainty-aware range \([\mu - k\sigma, \mu + k\sigma]\). The parameter \(k\) serves as an error tolerance coefficient, controlling the spread around the mean depth, ensuring smoother BEV projections and better spatial coherence.
Multi-view images are first processed by a backbone network to extract features. They are then input to a simple CNN layer to obtain splat features \( F_i \), opacity \( \alpha_i \), and depth distribution \( P_i \). The predicted depth distribution undergoes an uncertainty transformation to produce a 3D uncertainty \( x_i \). Next, BEV features are obtained through a splatting process, integrating feature distributions across views. The resulting BEV features \( \mathbf{F}_{\text{BEV}} \), enriched with uncertainty awareness, are used as input to the task-specific head for prediction.
GaussianLSS achieves state-of-the-art performance among 2D unprojection baselines. In addition, it also demonstrates competitive performance compared to 3D projection-based methods, while offering significant advantages in memory efficiency and inference speed.
@inproceedings{lu2025GaussianLSS,
author = {Shu-Wei Lu and Yi-Hsuan Tsai and Yi-Ting Chen},
title = {Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025}
}