Vega Carrasco, Mariflor;
Manolopoulou, Ioanna;
O'Sullivan, Jason;
Prior, Rosie;
Musolesi, Mirco;
(2022)
Posterior summaries of grocery retail topic models: Evaluation, interpretability and credibility.
Journal of the Royal Statistical Society: Series C (Applied Statistics)
10.1111/rssc.12546.
(In press).
Preview |
Text
Royal Stata Society Series C - 2022 - Vega Carrasco - Posterior summaries of grocery retail topic models Evaluation .pdf - Published Version Download (3MB) | Preview |
Abstract
Understanding the shopping motivations behind market baskets has significant commercial value for the grocery retail industry. The analysis of shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while delivering interpretable outcomes. Latent Dirichlet allocation (LDA) allows processing grocery transactions and the discovering of customer behaviours. Interpretations of topic models typically exploit individual samples overlooking the uncertainty of single topics. Moreover, training LDA multiple times show topics with large uncertainty, that is, topics (dis)appear in some but not all posterior samples, concurring with various authors in the field. In response, we introduce a clustering methodology that post-processes posterior LDA draws to summarise topic distributions represented as recurrent topics. Our approach identifies clusters of topics that belong to different samples and provides associated measures of uncertainty for each group. Our proposed methodology allows the identification of an unconstrained number of customer behaviours presented as recurrent topics. We also establish a more holistic framework for model evaluation, which assesses topic models based not only on their predictive likelihood but also on quality aspects such as coherence and distinctiveness of single topics and credibility of a set of topics. Using the outcomes of a tailored survey, we set thresholds that aid in interpreting quality aspects in grocery retail data. We demonstrate that selecting recurrent topics not only improves predictive likelihood but also outperforms interpretability and credibility. We illustrate our methods with an example from a large British supermarket chain.
Type: | Article |
---|---|
Title: | Posterior summaries of grocery retail topic models: Evaluation, interpretability and credibility |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1111/rssc.12546 |
Publisher version: | https://doi.org/10.1111/rssc.12546 |
Language: | English |
Additional information: | This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third-party material in this article are included in the Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
UCL classification: | UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science UCL > Provost and Vice Provost Offices > UCL BEAMS UCL |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10147095 |
Archive Staff Only
View Item |