Who Is the Strongest Enemy?
Towards Optimal and Efficient Evasion Attacks in Deep RL


Yanchao Sun1     Ruijie Zheng1     Yongyuan Liang2     Furong Huang1    
1 University of Maryland College Park      2 Sun Yat-sen University

Pong-attack-natural.gif Pong-attack-natural.gif RoadRunner-attack-natural.gif Pong-attack-natural.gif Pong-attack-natural.gif Pong-attack-natural.gif
Hopper, PPO
clean reward: 3104
Hopper, PPO
reward under strongest
previous attack: 636
Hopper, PPO
reward under our
PA-AD attack: 171
Walker, PPO
clean reward: 4630
Walker, PPO
reward under strongest
previous attack: 1048
Walker, PPO
reward under our
PA-AD attack: 813
Pong-attack-natural.gif Pong-attack-natural.gif RoadRunner-attack-natural.gif Pong-attack-natural.gif Pong-attack-natural.gif Pong-attack-natural.gif
Pong, DQN
clean reward: 21
(agent: right paddle)
Pong, DQN
reward under strongest
previous attack: -14
Pong, DQN
reward under our
PA-AD attack: -21
Boxing, DQN
clean reward: 94
(agent: left boxer)
Boxing, DQN
reward under strongest
previous attack: 69
Boxing, DQN
reward under our
PA-AD attack: 8



Abstract


Evaluating the worst-case performance of a reinforcement learning (RL) agent under the strongest/optimal adversarial perturbations on state observations (within some constraints) is crucial for understanding the robustness of RL agents. However, finding the optimal adversary is challenging, in terms of both whether we can find the optimal attack and how efficiently we can find it. Existing works on adversarial RL either use heuristics-based methods that may not find the strongest adversary, or directly train an RL-based adversary by treating the agent as a part of the environment, which can find the optimal adversary but may become intractable in a large state space. This paper introduces a novel attacking method to find the optimal attacks through collaboration between a designed function named “actor” and an RL-based learner named “director”. The actor crafts state perturbations for a given policy perturbation direction, and the director learns to propose the best policy perturbation directions. Our proposed algorithm, PA-AD, is theoretically optimal and significantly more efficient than prior RL-based works in environments with large state spaces. Empirical results show that our proposed PA-AD universally outperforms state-of-the-art attacking methods in various Atari and MuJoCo environments. By applying PA-AD to adversarial training, we achieve state-of-the-art empirical robustness in multiple tasks under strong adversaries.


Paper


Who Is the Strongest Enemy? Towards Optimal and Efficient Evasion Attacks in Deep RL
Yanchao Sun , Ruijie Zheng , Yongyuan Liang , Furong Huang

[Paper]  [Code] 


Method


Previous Appoach
Our Approach (PA-AD) : Instead of learning an end-to-end adversary policy that maps observations directly into observation perturbation, we decouple the whole attacking process into two simpler components: policy perturbation and state perturbation, solved by a “director” and an “actor” respectively. The director learns the optimal policy perturbing direction with RL methods, while the actor crafts adversarial states at every step such that the victim policy is perturbed towards the given direction. In this way, we obtain a both optimal and efficient attacking algorithm which is extremely powerful on environments with large observation space.


Results


Empirically, we verify that our PA-AD algorithm almost always outperforms all existing attacking methods by a large margin on various OpenAI Gym environments, including Atari and MuJoCo tasks. Here is a comparison of PA-AD against existing attacking methods on seven Atari Games and four MuJoco tasks. We compute the average episode rewards plus/minus standard deviation of vanilla DQN and A2C or PPO agents under different evasion attack methods
Here, we have the comparison of PA-AD attacks against other existing attacks with different choices of epsilon on three Atari Games: Boxing, Pong, and RoadRunner respectively.

In addition to the given epsilon shown in the table above, we also have the comparison of PA-AD attacks against other attackers under different choices of epsilon on three Atari Games: Boxing, Pong, RoadRunner respectively.
             

Besides, when combining with existing robust training methods, our algorithm PA-AD could significantly improve the robustness of the trained RL policies.