Sharma, Tushar;
Kechagia, Maria;
Georgiou, Stefanos;
Tiwari, Rohit;
Vats, Indira;
Moazen, Hadi;
Sarro, Federica;
(2023)
A survey on machine learning techniques applied to source code.
Journal of Systems and Software
, Article 111934. 10.1016/j.jss.2023.111934.
(In press).
Preview |
PDF
jss-ml-survey.pdf - Accepted Version Download (2MB) | Preview |
Abstract
The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such a large number of studies hinders the community from understanding the current research landscape. This paper aims to summarize the current knowledge in applied machine learning for source code analysis. We review studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we conducted an extensive literature search and identified 494 studies. We summarize our observations and findings with the help of the identified studies. Our findings suggest that the use of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task and summarize machine learning techniques employed. We identify a comprehensive list of available datasets and tools useable in this context. Finally, the paper discusses perceived challenges in this area, including the availability of standard datasets, reproducibility and replicability, and hardware resources. Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.
Type: | Article |
---|---|
Title: | A survey on machine learning techniques applied to source code |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1016/j.jss.2023.111934 |
Publisher version: | http://dx.doi.org/10.1016/j.jss.2023.111934 |
Language: | English |
Additional information: | This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
Keywords: | Machine learning for software engineering, Source code analysis, Deep learning, Datasets, Tools |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10184342 |
Archive Staff Only
View Item |