Skip to main content

BDPL: A Boundary Differentially Private Layer Against Machine Learning Model Extraction Attacks

  • Conference paper
  • First Online:
Computer Security – ESORICS 2019 (ESORICS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11735))

Included in the following conference series:

Abstract

Machine learning models trained by large volume of proprietary data and intensive computational resources are valuable assets of their owners, who merchandise these models to third-party users through prediction service API. However, existing literature shows that model parameters are vulnerable to extraction attacks which accumulate a large number of prediction queries and their responses to train a replica model. As countermeasures, researchers have proposed to reduce the rich API output, such as hiding the precise confidence level of the prediction response. Nonetheless, even with response being only one bit, an adversary can still exploit fine-tuned queries with differential property to infer the decision boundary of the underlying model. In this paper, we propose boundary differential privacy (\(\epsilon \)-BDP) as a solution to protect against such attacks by obfuscating the prediction responses near the decision boundary. \(\epsilon \)-BDP guarantees an adversary cannot learn the decision boundary by a predefined precision no matter how many queries are issued to the prediction API. We design and prove a perturbation algorithm called boundary randomized response that can achieve \(\epsilon \)-BDP. The effectiveness and high utility of our solution against model extraction attacks are verified by extensive experiments on both linear and non-linear models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
EUR 29.95
Price includes VAT (India)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 42.79
Price includes VAT (India)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 49.99
Price excludes VAT (India)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In general, this notation can be any distance metrics (e.g., Manhattan distance, Euclidean distance). The implications of distance metrics to detailed algorithms will be discussed in Sect. 4.1.

  2. 2.

    The white-box assumption is based on the fact that state-of-the-art models in specific application domains, such as image classification, are usually public knowledge. Nonetheless, our solution can also work against black-box attacks where such knowledge is proprietary.

  3. 3.

    The case of tangency is rarely reached in real life given that the feature space is usually continuous. For simplicity, we mainly consider intersection.

  4. 4.

    If \(\varDelta \) is small, the decision boundary near the ball can be treated as a hyperplane.

  5. 5.

    To do this, we start with 1 random flip out of all responses and measure its overall extraction rate. We then repeatedly increment this number by 1 until the overall extraction rate is very close to that of BDPL.

References

  1. Abadi, M., Agarwal, A., Barham, P., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/, software available from tensorflow.org

  2. Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318 (2016)

    Google Scholar 

  3. Angluin, D.: Queries and concept learning. Mach. Learn. 2, 319–342 (1987)

    MathSciNet  Google Scholar 

  4. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  5. Duchi, J.C., Jordan, M.I., Wainwright, M.J.: Local privacy and statistical minimax rates. In: IEEE Symposium on Foundations of Computer Science, pp. 429–438 (2013)

    Google Scholar 

  6. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.1007/11787006_1

    Chapter  Google Scholar 

  7. Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333 (2015)

    Google Scholar 

  8. Harris, D.M., Harris, S.L.: Digital design and computer architecture (2007)

    Chapter  Google Scholar 

  9. Juuti, M., Szyller, S., Dmitrenko, A., Marchal, S., Asokan, N.: Prada: Protecting against DNN model stealing attacks. CoRR abs/1805.02628 (2018)

    Google Scholar 

  10. Kesarwani, M., Mukhoty, B., Arya, V., Mehta, S.: Model extraction warning in MLAAS paradigm. In: Annual Computer Security Applications Conference (2018)

    Google Scholar 

  11. Lee, J., Kifer, D.: Concentrated differentially private gradient descent with adaptive per-iteration privacy budget. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2018)

    Google Scholar 

  12. Lee, T., Edwards, B., Molloy, I., Su, D.: Defending against model stealing attacks using deceptive perturbations. CoRR abs/1806.00054 (2018)

    Google Scholar 

  13. Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. KDD 2005, pp. 641–647. ACM (2005)

    Google Scholar 

  14. Oh, S.J., Augustin, M., Schiele, B., Fritz, M.: Towards reverse-engineering black-box neural networks. In: International Conference on Learning Representations (2018)

    Google Scholar 

  15. Orekondy, T., Schiele, B., Fritz, M.: Knockoff nets: stealing functionality of black-box models. CoRR abs/1812.02766 (2018)

    Google Scholar 

  16. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519 (2017)

    Google Scholar 

  17. Quiring, E., Arp, D., Rieck, K.: Forgotten siblings: Unifying attacks on machine learning and digital watermarking. In: IEEE European Symposium on Security and Privacy (EuroS&P), pp. 488–502 (2018)

    Google Scholar 

  18. Shokri, R., Stronati, M., Shmatikov, V.: Membership inference attacks against machine learning models. In: IEEE Symposium on Security and Privacy, pp. 3–18 (2017)

    Google Scholar 

  19. Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction APIS. In: Proceedings of the 25th USENIX Conference on Security Symposium, pp. 601–618 (2016)

    Google Scholar 

  20. Valiant, L.G.: A theory of the learnable. In: ACM Symposium on Theory of Computing (1984)

    Google Scholar 

  21. Wang, B., Gong, N.Z.: Stealing hyperparameters in machine learning. In: IEEE Symposium on Security and Privacy, pp. 36–52 (2018)

    Google Scholar 

  22. Warner, S.L.: Randomized response: a survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 60(309), 63–69 (1965)

    Article  Google Scholar 

  23. Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers: a case study on PDF malware classifiers. In: Annual Network and Distributed System Security Symposium (2016)

    Google Scholar 

Download references

Acknowledgement

This work was supported by National Natural Science Foundation of China (Grant No: 61572413, U1636205, 91646203, 61532010, 91846 204, and 61532016), the Research Grants Council, Hong Kong SAR, China (Grant No: 15238116, 15222118 and C1008-16G), and a research grant from Huawei Technologies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huadi Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, H., Ye, Q., Hu, H., Fang, C., Shi, J. (2019). BDPL: A Boundary Differentially Private Layer Against Machine Learning Model Extraction Attacks. In: Sako, K., Schneider, S., Ryan, P. (eds) Computer Security – ESORICS 2019. ESORICS 2019. Lecture Notes in Computer Science(), vol 11735. Springer, Cham. https://doi.org/10.1007/978-3-030-29959-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29959-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29958-3

  • Online ISBN: 978-3-030-29959-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics