← All terms

CLIP

Also known as: Contrastive Language-Image Pre-Training

A vision-language model developed by OpenAI that learns to associate images with natural language descriptions through contrastive learning on large-scale image-text pairs. CLIP can compute similarity scores between images and text, enabling zero-shot classification and retrieval tasks. In accessibility research, CLIP is used for object recognition, scene understanding, and aligning visual content with textual descriptions to support assistive technologies for blind and low vision users.

Category: artificial intelligence · computer vision

Related: Vision-Language Model · SigLIP · Object Detection

Sources