Depth Estimation

Also known as: Monocular Depth Estimation, Depth Prediction

The computer vision task of predicting the distance from the camera to each point in a scene, producing a depth map in which each pixel carries a distance value. Monocular depth estimation uses a single RGB image (no stereo cameras or LiDAR) and typically relies on deep learning models such as MiDaS or Depth Anything. Depth can be either relative (ordinal ranking of near-vs-far, usually normalized per frame) or metric (absolute distances in meters, which require calibration). Depth estimation is foundational to spatial awareness assistive technologies for blind and low vision users, including obstacle avoidance, scene layout understanding, and sensory substitution systems that sonify proximity.

Category: Computer Vision · AI and accessibility · Assistive Technology · Blind and Low Vision

Related: Sensory substitution · Assistive technology · Object recognition

Sources

https://pytorch.org/hub/intelisl_midas_v2/
https://paperswithcode.com/task/monocular-depth-estimation