查看文章

arxiv.org 中的 [PDF]

Towards reverse-engineering black-box neural networks

作者

Seong Joon Oh, Bernt Schiele, Mario Fritz

发表日期

2019

期刊

Explainable AI: interpreting, explaining and visualizing deep learning

页码范围

121-144

出版商

Springer International Publishing

简介

Much progress in interpretable AI is built around scenarios where the user, one who interprets the model, has a full ownership of the model to be diagnosed. The user either owns the training data and computing resources to train an interpretable model herself or owns a full access to an already trained model to be interpreted post-hoc. In this chapter, we consider a less investigated scenario of diagnosing black-box neural networks, where the user can only send queries and read off outputs. Black-box access is a common deployment mode for many public and commercial models, since internal details, such as architecture, optimisation procedure, and training data, can be proprietary and aggravate their vulnerability to attacks like adversarial examples. We propose a method for exposing internals of black-box models and show that the method is surprisingly effective at inferring a diverse set of internal information …

引用总数

被引用次数：416

201720182019202020212022202320242 11 31 66 89 92 87 35

学术搜索中的文章

Towards reverse-engineering black-box neural networks

SJ Oh, B Schiele, M Fritz - Explainable AI: interpreting, explaining and visualizing …, 2019

被引用次数：413 相关文章所有 15 个版本

Whitening black-box neural networks*

SJ Oh, M Augustin, B Schiele, M Fritz - 2017

被引用次数：3 相关文章所有 4 个版本