UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

PROSAC: Provably Safe Certification for Machine Learning Models under Adversarial Attacks

Liu, Ziquan; Zhi, Zhuo; Bogunovic, Ilija; Gerner-Beuerle, Carsten; Rodrigues, Miguel; (2023) PROSAC: Provably Safe Certification for Machine Learning Models under Adversarial Attacks. In: NeurIPS 2023 Workshop on Regulatable ML - Proceedings. NeurIPS: New Orleans, LA, USA. Green open access

[thumbnail of 3_prosac_provably_safe_certifica.pdf]
Preview
Text
3_prosac_provably_safe_certifica.pdf - Published Version

Download (563kB) | Preview

Abstract

It is widely known that state-of-the-art machine learning models — including vision and language models — can be seriously compromised by adversarial perturbations, so it is also increasingly relevant to develop capability to certify their performance in the presence of the most effective adversarial attacks. Our paper offers a new approach to certify the performance of machine learning models in the presence of adversarial attacks, with population level risk guarantees. In particular, given a specific attack, we introduce the notion of a machine learning model safety guarantee: this guarantee, which is supported by a testing procedure based on the availability of a calibration set, entails one will only declare that a machine learning model adversarial (population) risk is less than (i.e. the model is safe) given that the model adversarial (population) risk is higher than (i.e. the model is in fact unsafe), with probability less than . We also propose Bayesian optimization algorithms to determine very efficiently whether or not a machine learning model is -safe in the presence of an adversarial attack, along with their statistical guarantees. We apply our framework to a range of machine learning models — including various sizes of vision Transformer (ViT) and ResNet models — impaired by a variety of adversarial attacks such as AutoAttack, SquareAttack and natural evolution strategy attack, in order to illustrate the merit of our approach. Of particular relevance, we show that ViT's are generally more robust to adversarial attacks than ResNets and ViT-large is more robust than smaller models. Overall, our approach goes beyond existing empirical adversarial risk based certification guarantees, paving the way to more effective AI regulation based on rigorous (and provable) performance guarantees.

Type: Proceedings paper
Title: PROSAC: Provably Safe Certification for Machine Learning Models under Adversarial Attacks
Event: Workshop on Regulatable Machine Learning at the 37th Conference on Neural Information Processing Systems (RegML @ NeurIPS 2023).
Location: New Orleans, United States
Open access status: An open access version is available from UCL Discovery
Publisher version: https://openreview.net/forum?id=8NotCTD9cQ
Language: English
Additional information: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Adversarial Risk Certification; AI Safety
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10181256
Downloads since deposit
738Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item