LLMs Struggle with Multiple Code Vulnerabilities
Analysis
This paper addresses a critical gap in LLM security research by moving beyond single-vulnerability detection. It highlights the limitations of current LLMs in handling the complexity of real-world code where multiple vulnerabilities often co-occur. The introduction of a multi-vulnerability benchmark and the evaluation of state-of-the-art LLMs provides valuable insights into their performance and failure modes, particularly the impact of vulnerability density and language-specific challenges.
Key Takeaways
- •LLMs' vulnerability detection performance degrades significantly with increasing vulnerability density.
- •Different programming languages exhibit distinct failure modes for LLMs.
- •Current LLMs struggle with accurately identifying multiple vulnerabilities in complex code.
- •The paper introduces a new benchmark for multi-vulnerability detection.
“Performance drops by up to 40% in high-density settings, and Python and JavaScript show distinct failure modes, with models exhibiting severe "under-counting".”