UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models

Siegel, NY; Camburu, OM; Heess, N; Perez-Ortiz, M; (2024) The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). (pp. pp. 530-546). Association for Computational Linguistics Green open access

[thumbnail of Camburu_The Probabilities Also Matter_VoR.pdf]
Preview
Text
Camburu_The Probabilities Also Matter_VoR.pdf

Download (649kB) | Preview

Abstract

In order to oversee advanced AI systems, it is important to understand their underlying decision-making process. When prompted, large language models (LLMs) can provide natural language explanations or reasoning traces that sound plausible and receive high ratings from human annotators. However, it is unclear to what extent these explanations are faithful, i.e., truly capture the factors responsible for the model’s predictions. In this work, we introduce Correlational Explanatory Faithfulness (CEF), a metric that can be used in faithfulness tests based on input interventions. Previous metrics used in such tests take into account only binary changes in the predictions. Our metric accounts for the total shift in the model’s predicted label distribution, more accurately reflecting the explanations’ faithfulness. We then introduce the Correlational Counterfactual Test (CCT) by instantiating CEF on the Counterfactual Test (CT) from Atanasova et al. (2023). We evaluate the faithfulness of free-text explanations generated by few-shot-prompted LLMs from the Llama2 family on three NLP tasks. We find that our metric measures aspects of faithfulness which the CT misses.

Type: Proceedings paper
Title: The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models
Event: 62nd Annual Meeting of the Association for Computational Linguistics
Open access status: An open access version is available from UCL Discovery
DOI: 10.18653/v1/2024.acl-short.49
Publisher version: https://doi.org/10.18653/v1/2024.acl-short.49
Language: English
Additional information: © 2024 ACL. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10197494
Downloads since deposit
216Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item