Search: adversarial attacks - ai.jp.net

safety #llm 👥 CommunityAnalyzed: Jan 11, 2026 19:00

AI Insiders Launch Data Poisoning Offensive: A Threat to LLMs

Published:Jan 11, 2026 17:05

•

1 min read

•

Hacker News

Analysis

The launch of a site dedicated to data poisoning represents a serious threat to the integrity and reliability of large language models (LLMs). This highlights the vulnerability of AI systems to adversarial attacks and the importance of robust data validation and security measures throughout the LLM lifecycle, from training to deployment.

Key Takeaways

•AI insiders are actively working to compromise LLMs through data poisoning.
•A small, targeted data set can significantly impact model performance.
•The attack targets the data used to train the models, not the model code itself.

Reference

“A small number of samples can poison LLMs of any size.”

Permalink Hacker News

safety #data poisoning 📝 BlogAnalyzed: Jan 11, 2026 18:35

Data Poisoning Attacks: A Practical Guide to Label Flipping on CIFAR-10

Published:Jan 11, 2026 15:47

•

1 min read

•

MarkTechPost

Analysis

This article highlights a critical vulnerability in deep learning models: data poisoning. Demonstrating this attack on CIFAR-10 provides a tangible understanding of how malicious actors can manipulate training data to degrade model performance or introduce biases. Understanding and mitigating such attacks is crucial for building robust and trustworthy AI systems.

Key Takeaways

•The article focuses on data poisoning attacks through label flipping.
•It uses the CIFAR-10 dataset and a ResNet-style network for demonstration.
•The tutorial aims to show how manipulating training data can affect model behavior.

Reference

“By selectively flipping a fraction of samples from...”

Permalink MarkTechPost

research #voice 🔬 ResearchAnalyzed: Jan 6, 2026 07:31

IO-RAE: A Novel Approach to Audio Privacy via Reversible Adversarial Examples

Published:Jan 6, 2026 05:00

•

1 min read

•

ArXiv Audio Speech

Analysis

This paper presents a promising technique for audio privacy, leveraging LLMs to generate adversarial examples that obfuscate speech while maintaining reversibility. The high misguidance rates reported, especially against commercial ASR systems, suggest significant potential, but further scrutiny is needed regarding the robustness of the method against adaptive attacks and the computational cost of generating and reversing the adversarial examples. The reliance on LLMs also introduces potential biases that need to be addressed.

Key Takeaways

•IO-RAE framework uses reversible adversarial examples for audio privacy.
•Cumulative Signal Attack mitigates high-frequency noise.
•Achieves high misguidance rates against ASR models, including Google's.

Reference

“This paper introduces an Information-Obfuscation Reversible Adversarial Example (IO-RAE) framework, the pioneering method designed to safeguard audio privacy using reversible adversarial examples.”

Permalink ArXiv Audio Speech

Research #llm 📝 BlogAnalyzed: Jan 3, 2026 05:48

Self-Testing Agentic AI System Implementation

Published:Jan 2, 2026 20:18

•

1 min read

•

MarkTechPost

Analysis

The article describes a coding implementation for a self-testing AI system focused on red-teaming and safety. It highlights the use of Strands Agents to evaluate a tool-using AI against adversarial attacks like prompt injection and tool misuse. The core focus is on proactive safety engineering.

Key Takeaways

•Focus on proactive safety engineering for AI systems.
•Utilizes Strands Agents for red-teaming and adversarial testing.
•Targets prompt injection and tool misuse vulnerabilities.

Reference

“In this tutorial, we build an advanced red-team evaluation harness using Strands Agents to stress-test a tool-using AI system against prompt-injection and tool-misuse attacks.”

Permalink MarkTechPost

Research Paper #Generative AI Security, Provable Security, Consensus Sampling 🔬 ResearchAnalyzed: Jan 3, 2026 06:21

Reliable Consensus Sampling for Provably Secure Generative AI

Published:Dec 31, 2025 15:33

•

1 min read

•

ArXiv

Analysis

This paper addresses the critical need for provably secure generative AI, moving beyond empirical attack-defense cycles. It identifies limitations in existing Consensus Sampling (CS) and proposes Reliable Consensus Sampling (RCS) to improve robustness, utility, and eliminate abstention. The development of a feedback algorithm to dynamically enhance safety is a key contribution.

Key Takeaways

•Proposes Reliable Consensus Sampling (RCS) as an improvement over Consensus Sampling (CS) for provably secure generative AI.
•RCS enhances robustness against adversarial attacks and improves utility compared to CS.
•RCS eliminates the need for abstention, a common limitation of CS.
•Introduces a feedback algorithm for dynamic safety enhancement of RCS.
•Provides theoretical guarantees for controllable risk thresholds with RCS.

Reference

“RCS traces acceptance probability to tolerate extreme adversarial behaviors, improving robustness. RCS also eliminates the need for abstention entirely.”

AI Insiders Launch Data Poisoning Offensive: A Threat to LLMs

Analysis

Key Takeaways

Data Poisoning Attacks: A Practical Guide to Label Flipping on CIFAR-10

Analysis

Key Takeaways

IO-RAE: A Novel Approach to Audio Privacy via Reversible Adversarial Examples

Analysis

Key Takeaways

Self-Testing Agentic AI System Implementation

Analysis

Key Takeaways

Reliable Consensus Sampling for Provably Secure Generative AI

Analysis

Key Takeaways

Adversarial Attack on Monocular Depth Estimation using Physics-in-the-Loop Optimization

Analysis

Key Takeaways

Causal Physiological Representation Learning for Robust ECG Analysis

Analysis

Key Takeaways

Defenses for RAG Against Corpus Poisoning

Analysis

Key Takeaways

Adversarial Objects for Depth Estimation Attacks via Diffusion

Analysis

Key Takeaways

RepetitionCurse: DoS Attacks on MoE LLMs

Analysis

Key Takeaways

Adversarial Attacks on Text-to-Video Models

Analysis

Key Takeaways

Universal Targeted Attack on Audio-Language Models

Analysis

Key Takeaways

DDFT: A New Test for LLM Reliability

Analysis

Key Takeaways

Adversarial Examples from Attention Layers for LLM Evaluation

Analysis

Key Takeaways

Improved Bounds for Private and Robust Language Model Alignment

Analysis

Key Takeaways

Multilingual Prompt Injection Attacks on LLM Academic Reviewing

Analysis

Key Takeaways

RobustMask: Certified Robustness for Neural Ranking

Analysis

Key Takeaways

Web Agent Persuasion Benchmark

Analysis

Key Takeaways

Dark Patterns Manipulate Web Agents

Analysis

Key Takeaways

On the Stealth of Unbounded Attacks Under Non-Negative-Kernel Feedback

Analysis

Key Takeaways

Reliable Adversarial Robustness Evaluation for Spiking Neural Networks

Analysis

Key Takeaways

Physics-Aware Attacks on EV Charging Systems

Analysis

Key Takeaways

Scaling Adversarial Training via Data Selection

Analysis

Key Takeaways

Targeted Attacks on Vision-Language Models with Fewer Tokens

Analysis

Key Takeaways

Adversarial Attacks on Android Malware Detection via LLMs

Analysis

Key Takeaways

CoTDeceptor: Adversarial Obfuscation for LLM Code Agents

Analysis

Key Takeaways

Improving the Convergence Rate of Ray Search Optimization for Query-Efficient Hard-Label Attacks

Analysis