Large Vision Model
Also known as: LVM
A large vision model is a foundation model trained on very large image (and often video) datasets to produce general-purpose visual representations - capable of object detection, segmentation, captioning, or feature extraction without task-specific retraining. Examples include SAM (Segment Anything), CLIP, DINOv2, and the vision backbones of GPT-4V and Moondream. In accessibility, LVMs are commonly paired with LLMs to convert camera input into structured descriptions for blind and low-vision users, or into scene understanding for navigation and physical assistance.
Category: Artificial Intelligence · Computer Vision · AI and accessibility · Machine Learning
Related: Foundation Model · Vision-Language Model · Large Language Model