Computational Experiments for Generative AI Based Automatic Program Repair
Title: Computational Experiments for Generative AI Based Automatic Program Repair
DNr: Berzelius-2024-131
Project Type: LiU Berzelius
Principal Investigator: He Ye <heye@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2024-04-01 – 2024-10-01
Classification: 20206
Keywords:

Abstract

The vision of Project WASP Automatic Program Repair is to enhance software quality and robustness through automation, addressing the prevalent issue of software bugs. As software bugs consume a significant portion of a project’s budget, estimated at over 50% of the total cost, automated program repair has emerged as a promising research field to correct defects and vulnerabilities, ultimately reducing debugging costs and allowing programmers to focus on other tasks [O’Dell, ACM Queue 2017]. Project WASP Automatic Program Repair aims to alleviate the burden of manual error fixing by providing debugging assistance and maintaining legacy software. In our experiment, our objective is twofold: to train the code model from scratch and to fine-tune the existing model using domain-specific datasets. Our research utilizes and develops state-of-the-art language models (such as LLaMA from Meta, or NeMo from NVIDIA) to develop powerful automated program repair systems that achieve both autonomy and velocity. The goal of this project is to learn from large-scale coding sources available in open-source repositories, and create large language models. These large language models can be deployed in real-world scenarios in the task of program repair, as well as be further adapted by the community to other related tasks. List of references: • He Ye, and Martin Monperrus. ITER: Iterative Neural Repair for Multi-Location Patches. Accepted to the 46th International Conference on Software Engineering, 2024. • He Ye, Zimin Chen, Claire Le Goues. PreciseBugCollector: Extensible, Executable and Precise Bug-fix Collection. Accepted to the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) Industry Challenge Competition, 2023. • He Ye, Matias Martinez, Xiapu Luo, Tao Zhang, and Martin Monperrus. 2023. SelfAPR: Self-supervised Program Repair with Test Execution Diagnostics. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 92, 1–13. • He Ye, Matias Martinez, and Martin Monperrus. Neural program repair with execution-based backpropagation. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 1506–1518. • He Ye, Jian Gu, Matias Martinez, Thomas Durieux and Martin Monperrus. Automated Classification of Overfitting Patches With Statically Extracted Code Features, in IEEE Transactions on Software Engineering (TSE), vol. 48, no. 8, pp. 2920-2938. • He Ye, Matias Martinez, and Martin Monperrus. Automated patch assessment for program repair at scale. Empirical Software Engineering (EmSE), Volume 26, No. 2. • Zimin Chen, Vincent J Hellendoorn, Pascal Lamblin, Petros Maniatis, Pierre-Antoine Manzagol, Daniel Tarlow, Subhodeep Moitra. PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair. Advances in Neural Information Processing Systems 34 (NeurIPS 2021). • Z. Chen, S. Kommrusch and M. Monperrus, ”Neural Transfer Learning for Repairing Security Vulnerabilities in C Code,” in IEEE Transactions on Software Engineering, vol. 49, no. 1, pp. 147-165, 1 Jan. 2023, doi: 10.1109/TSE.2022.3147265. • Z. Chen, S. Kommrusch, M. Tufano, L. -N. Pouchet, D. Poshyvanyk and M. Monperrus, ”SequenceR: Sequence-to-Sequence Learning for End-toEnd Program Repair,” in IEEE Transactions on Software Engineering, vol. 47, no. 9, pp. 1943-1959, 1 Sept. 2021, doi: 10.1109/TSE.2019.2940179. • M. Borg, E. Aasa, K. Etemadi and M. Monperrus, ”Human, What Must I Tell You?,” in IEEE Software, vol. 40, no. 3, pp. 9-14, May-June 2023, doi: 10.1109/MS.2023.3244638.