Javascript is required
Search
Volume 5, Issue 1, 2026

Abstract

Full Text|PDF|XML

Large-scale Vision-Language Models (VLMs) like Contrastive Language-Image Pre-training (CLIP) have demonstrated their impressive zero-shot capabilities. However, adapting them to downstream tasks remains challenging, especially under domain shifts where visual features become unreliable. Existing training-free methods, such as Tip-Adapter, rely heavily on visual similarity, which often fails in out-of-distribution (OOD) scenarios. To address this, Decoupled Correction Adapter (DeCo-Adapter), a robust adaptation framework that integrates a Decoupled Knowledge Stream into the visual baseline, is proposed. Specifically, a novel Negative Semantic Suppression mechanism is introduced, leveraging Large Language Models (LLMs) to generate and penalize distractor descriptions. This mechanism effectively corrects visual ambiguities without requiring any training. Extensive experiments on ImageNet-Sketch, ImageNet-V2, and ImageNet-A demonstrate that DeCo-Adapter consistently outperforms state-of-the-art methods. Notably, it achieves a top-1 accuracy of 54.11% on ImageNet-Sketch, surpassing the strong Tip-Adapter baseline by leveraging negative knowledge for error correction.

- no more data -