Visual Language Model

Also known as: VLM, Vision-Language Model

AI models that can process and reason about both visual and textual information, combining computer vision with large language model capabilities. VLMs could potentially enhance assessment descriptors by providing contextually rich and customizable descriptions of visual content. However, they typically require off-device processing, raising privacy and security concerns for sensitive visual data. Prior work has also described large language models as designed to emulate confidence rather than provide factual information, limiting their reliability for verification tasks.

Category: artificial intelligence · computer vision

Related: Object Recognition · Visual Assistance Technology · Off-Device Processing

Sources

https://doi.org/10.1145/3663547.3746376