Adversarial Attacks Against Reinforcement Learning Agents with Ian Goodfellow & Sandy Huang
Research#Reinforcement Learning📝 Blog|Analyzed: Dec 29, 2025 08:29•
Published: Mar 15, 2018 16:27
•1 min read
•Practical AIAnalysis
This article from Practical AI discusses a paper on adversarial attacks against reinforcement learning (RL) agents. The guests, Ian Goodfellow and Sandy Huang, explain how these attacks can compromise the performance of neural network policies in RL, similar to how image classifiers can be fooled. The conversation covers the core concepts of the paper, including how small changes, like altering a single pixel, can significantly impact the performance of models trained on tasks like Atari games. The discussion also touches on related areas such as hierarchical reward functions and transfer learning, providing a comprehensive overview of the topic.
Key Takeaways
- •Adversarial attacks can be applied to reinforcement learning agents, impacting their performance.
- •Small, imperceptible changes to input data can significantly affect the output of RL models.
- •The discussion covers the intersection of adversarial attacks and RL, along with related topics like hierarchical reward functions.
Reference / Citation
View Original"Sandy gives us an overview of the paper, including how changing a single pixel value can throw off performance of a model trained to play Atari games."