LLM Safety: Temporal and Linguistic Vulnerabilities
Analysis
Key Takeaways
- •LLM safety is not consistently transferable across languages (English vs. Hausa).
- •Temporal framing (past vs. future) significantly impacts LLM safety performance.
- •Current LLMs rely on superficial heuristics, creating 'Safety Pockets'.
- •Invariant Alignment is proposed as a necessary paradigm shift for robust safety.
“The study found a 'Temporal Asymmetry, where past-tense framing bypassed defenses (15.6% safe) while future-tense scenarios triggered hyper-conservative refusals (57.2% safe).'”