Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

Mozes, M; Bartolo, M; Stenetorp, P; Kleinberg, B; Griffin, LD; (2021) Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. (pp. pp. 8258-8270). Association for Computational Linguistics: Online and Punta Cana, Dominican Republic. Green open access

Preview

Text
2021.emnlp-main.651.pdf - Published Version
Download (540kB) | Preview

Abstract

Research shows that natural language processing models are generally considered to be vulnerable to adversarial attacks; but recent work has drawn attention to the issue of validating these adversarial inputs against certain criteria (e.g., the preservation of semantics and grammaticality). Enforcing constraints to uphold such criteria may render attacks unsuccessful, raising the question of whether valid attacks are actually feasible. In this work, we investigate this through the lens of human language ability. We report on crowdsourcing studies in which we task humans with iteratively modifying words in an input text, while receiving immediate model feedback, with the aim of causing a sentiment classification model to misclassify the example. Our findings suggest that humans are capable of generating a substantial amount of adversarial examples using semantics-preserving word substitutions. We analyze how human-generated adversarial examples compare to the recently proposed TEXTFOOLER, GENETIC, BAE and SEMEMEPSO attack algorithms on the dimensions naturalness, preservation of sentiment, grammaticality and substitution rate. Our findings suggest that human-generated adversarial examples are not more able than the best algorithms to generate natural-reading, sentiment-preserving examples, though they do so by being much more computationally efficient.

Type:	Proceedings paper
Title:	Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification
Event:	2021 Conference on Empirical Methods in Natural Language Processing
Open access status:	An open access version is available from UCL Discovery
Publisher version:	https://aclanthology.org/2021.emnlp-main.651/
Language:	English
Additional information:	This article is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
UCL classification:	UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science UCL > Provost and Vice Provost Offices > UCL BEAMS UCL UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Security and Crime Science
URI:	https://discovery-pp.ucl.ac.uk/id/eprint/10147826

Downloads since deposit

2,376Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item