Targeted Attacks on Vision-Language Models with Fewer Tokens
Published:Dec 26, 2025 01:01
•1 min read
•ArXiv
Analysis
This paper highlights a critical vulnerability in Vision-Language Models (VLMs). It demonstrates that by focusing adversarial attacks on a small subset of high-entropy tokens (critical decision points), attackers can significantly degrade model performance and induce harmful outputs. This targeted approach is more efficient than previous methods, requiring fewer perturbations while achieving comparable or even superior results in terms of semantic degradation and harmful output generation. The paper's findings also reveal a concerning level of transferability of these attacks across different VLM architectures, suggesting a fundamental weakness in current VLM safety mechanisms.
Key Takeaways
- •VLMs are vulnerable to targeted adversarial attacks focusing on high-entropy tokens.
- •These attacks are more efficient than global methods, requiring fewer perturbations.
- •The attacks can convert a significant percentage of benign outputs into harmful ones.
- •The attacks exhibit strong transferability across different VLM architectures.
- •The paper proposes a new attack method (EGA) and highlights weaknesses in VLM safety mechanisms.
Reference
“By concentrating adversarial perturbations on these positions, we achieve semantic degradation comparable to global methods while using substantially smaller budgets. More importantly, across multiple representative VLMs, such selective attacks convert 35-49% of benign outputs into harmful ones, exposing a more critical safety risk.”