查看文章

openreview.net 中的 [PDF]

Universal Trojan Signatures in Reinforcement Learning

作者

Manoj Acharya, Weichao Zhou, Anirban Roy, Xiao Lin, Wenchao Li, Susmit Jha

发表日期

2023/10/28

研讨会论文

NeurIPS 2023 Workshop on Backdoors in Deep Learning-The Good, the Bad, and the Ugly

简介

We present a novel approach for characterizing Trojaned reinforcement learning (RL) agents. By monitoring for discrepancies in how an agent's policy evaluates state observations for choosing an action, we can reliably detect whether the policy is Trojaned. Experiments on the IARPA RL challenge benchmarks show that our approach can effectively detect Trojaned models even in transfer settings with novel RL environments and modified architectures.

学术搜索中的文章

Universal Trojan Signatures in Reinforcement Learning

M Acharya, W Zhou, A Roy, X Lin, W Li, S Jha - NeurIPS 2023 Workshop on Backdoors in Deep …, 2023