UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

Who Judges the Judge: An Empirical Study on Online Judge Tests

Liu, Kaibo; Han, Yudong; Zhang, Jie M; Chen, Zhenpeng; Sarro, Federica; Harman, Mark; Huang, Gang; (2023) Who Judges the Judge: An Empirical Study on Online Judge Tests. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. (pp. pp. 334-346). ACM (Association for Computing Machinery) Green open access

[thumbnail of ISSTA23_OJ.pdf]
Preview
Text
ISSTA23_OJ.pdf - Accepted Version

Download (1MB) | Preview

Abstract

Online Judge platforms play a pivotal role in education, competitive programming, recruitment, career training, and large language model training. They rely on predefined test suites to judge the correctness of submitted solutions. It is therefore important that the solution judgement is reliable and free from potentially misleading false positives (i.e., incorrect solutions that are judged as correct). In this paper, we conduct an empirical study of 939 coding problems with 541,552 solutions, all of which are judged to be correct according to the test suites used by the platform, finding that 43.4% of the problems include false positive solutions (3,440 bugs are revealed in total). We also find that test suites are, nevertheless, of high quality according to widely-studied test effectiveness measurements: 88.2% of false positives have perfect (100%) line coverage, 78.9% have perfect branch coverage, and 32.5% have a perfect mutation score. Our findings indicate that more work is required to weed out false positive solutions and to further improve test suite effectiveness. We have released the detected false positive solutions and the generated test inputs to facilitate future research.

Type: Proceedings paper
Title: Who Judges the Judge: An Empirical Study on Online Judge Tests
Event: The 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2023)
Location: Seattle, WA, USA
Dates: 17th-21st Jul 2023
ISBN-13: 979-8-4007-0221-1
Open access status: An open access version is available from UCL Discovery
DOI: 10.1145/3597926.3598060
Publisher version: https://doi.org/10.1145/3597926.3598060
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Online judge platform, software testing, test assessment
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10166066
Downloads since deposit
4,543Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item