Williams, Andrew;
(2024)
Comparison of generative AI performance on undergraduate and postgraduate written assessments in the biomedical sciences.
International Journal of Educational Technology in Higher Education
, 21
, Article 52. 10.1186/s41239-024-00485-y.
Preview |
Text
Williams-2024-International_Journal_of_Educational_Technology_in_Higher_Education.pdf - Other Download (949kB) | Preview |
Abstract
The value of generative AI tools in higher education has received considerable attention. Although there are many proponents of its value as a learning tool, many are concerned with the issues regarding academic integrity and its use by students to compose written assessments. This study evaluates and compares the output of three commonly used generative AI tools, ChatGPT, Bing and Bard. Each AI tool was prompted with an essay question from undergraduate (UG) level 4 (year 1), level 5 (year 2), level 6 (year 3) and postgraduate (PG) level 7 biomedical sciences courses. Anonymised AI generated output was then evaluated by four independent markers, according to specified marking criteria and matched to the Frameworks for Higher Education Qualifications (FHEQ) of UK level descriptors. Percentage scores and ordinal grades were given for each marking criteria across AI generated papers, inter-rater reliability was calculated using Kendall’s coefficient of concordance and generative AI performance ranked. Across all UG and PG levels, ChatGPT performed better than Bing or Bard in areas of scientific accuracy, scientific detail and context. All AI tools performed consistently well at PG level compared to UG level, although only ChatGPT consistently met levels of high attainment at all UG levels. ChatGPT and Bing did not provide adequate references, while Bing falsified references. In conclusion, generative AI tools are useful for providing scientific information consistent with the academic standards required of students in written assignments. These findings have broad implications for the design, implementation and grading of written assessments in higher education.
Type: | Article |
---|---|
Title: | Comparison of generative AI performance on undergraduate and postgraduate written assessments in the biomedical sciences |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1186/s41239-024-00485-y |
Publisher version: | http://dx.doi.org/10.1186/s41239-024-00485-y |
Language: | English |
Additional information: | This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
Keywords: | Assessment, Artifcial intelligence, Higher education, Academic writing, ChatGPT, Essay, Biomedical science, Medicine |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Medical Sciences > Div of Medicine |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10197256 |
Archive Staff Only
![]() |
View Item |