Uncertainty-Aware Vision-based Risk Object Identification via Conformal Risk Tube Prediction

Uncertainty-Aware Vision-based Risk Object Identification
via Conformal Risk Tube Prediction

Department of Computer Science
National Yang Ming Chiao Tung University

ICRA 2026

Abstract

We study object importance-based visual risk object identification (Visual-ROI), a key capability for detecting hazards in intelligent driving systems. Existing approaches are deterministic and ignore uncertainty, which can compromise safety. For example, using a fixed decision threshold in ambiguous scenarios would cause too-early or too-late detection of risks, and predictions that flicker between risky and non-risky states over time. These issues worsen under diverse contexts with multiple interacting risks, perturbing where and when risks occur. However, current vision methods lack a way to capture uncertainty jointly over space and time, limiting their ability to dynamically reflect changes in scene complexity.

We propose Risk Tube Prediction, a unified formulation that models spatiotemporal uncertainty in risk. We further introduce a new conformal prediction framework to provide coverage guarantees for the true risks and yield calibrated risk scores and uncertainty estimates. Specifically, we employ risk-category–aware calibrators that consider distinct characteristics to reduce confused calibration and localize risks more precisely in space and time. To evaluate, we present a new dataset and metrics probing diverse scenario configurations with multi-risk coupling effects. We systematically conduct experiments of factors that influence uncertainty estimation including variations in scenario configuration, per risk category analysis, and the propagation of perception errors. Our method delivers substantial improvements over prior approaches, enhancing both the robustness of Visual-ROI performance and downstream outcomes, such as reducing nuisance braking alerts.

Overview of Conformal Risk Tube Prediction

framework — Given front-view images, the model performs spatiotemporal relation modeling and predicts each object's future risk interval. Then, based on the object’s risk category, the corresponding conformal calibrator is applied to calibrate its risk scores over the interval. The calibrated Risk Tube uses a more precise temporal bound to fully cover the true risk interval of each hazardous object.

Multiple Coexisting Risks Dataset

Visual-ROI Visualization: Effect of Calibration

With conformal calibration, we mitigate temporal boundary misalignment (i.e., detecting or releasing risks too early or too late) and reduce fragmented predictions that flicker between risky and non-risky states over time.

Downstream Task: Braking Alerts

Visual-ROI: 2D-Trajectory Prediction (TP), Behavior Prediction (BP), Collision Anticipation (CA).
Braking Alerts Criteria: distance < 10 m and Visual-ROI flags risky.
Our method, which produces calibrated and temporally aligned risk intervals, effectively reduces nuisance braking alerts while ensuring timely warnings for genuine risks.

Uncertainty-Aware Vision-based Risk Object Identification
via Conformal Risk Tube Prediction

Abstract

Methodology

Overview of Conformal Risk Tube Prediction

Dataset

Multiple Coexisting Risks Dataset

Qualitative Results

Visual-ROI Visualization: Effect of Calibration

More Visual-ROI Visualization

Downstream Task: Braking Alerts

BibTeX