Search:
Match:
1 results

Targeted Attacks on Vision-Language Models with Fewer Tokens

Published:Dec 26, 2025 01:01
1 min read
ArXiv

Analysis

This paper highlights a critical vulnerability in Vision-Language Models (VLMs). It demonstrates that by focusing adversarial attacks on a small subset of high-entropy tokens (critical decision points), attackers can significantly degrade model performance and induce harmful outputs. This targeted approach is more efficient than previous methods, requiring fewer perturbations while achieving comparable or even superior results in terms of semantic degradation and harmful output generation. The paper's findings also reveal a concerning level of transferability of these attacks across different VLM architectures, suggesting a fundamental weakness in current VLM safety mechanisms.
Reference

By concentrating adversarial perturbations on these positions, we achieve semantic degradation comparable to global methods while using substantially smaller budgets. More importantly, across multiple representative VLMs, such selective attacks convert 35-49% of benign outputs into harmful ones, exposing a more critical safety risk.