Hanif, Hazim and Maffeis, Sergio (2022) VulBERTa: simplified source code pre-training for vulnerability detection. In: 2022 International Joint Conference on Neural Networks, IJCNN 2022, 18-23 July 2022, Padua.
Full text not available from this repository.Abstract
This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects. The model learns a deep knowledge representation of the code syntax and semantics, which we leverage to train vulnerability detection classifiers. We evaluate our approach on binary and multi-class vulnerability detection tasks across several datasets (Vuldeepecker, Draper, REVEAL and muVuldeepecker) and benchmarks (CodeXGLUE and D2A). The evaluation results show that VulBERTa achieves state-of-the-art performance and outperforms existing approaches across different datasets, despite its conceptual simplicity, and limited cost in terms of size of training data and number of model parameters.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Funders: | Google [Grant no. GCP19980904] |
Uncontrolled Keywords: | Vulnerability detection; Software vulnerabilites; Pre-training; Deep learning; Representation learning |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software |
Divisions: | Faculty of Computer Science & Information Technology > Department of Software Engineering |
Depositing User: | Ms. Juhaida Abd Rahim |
Date Deposited: | 13 Feb 2025 04:31 |
Last Modified: | 13 Feb 2025 04:31 |
URI: | http://eprints.um.edu.my/id/eprint/40469 |
Actions (login required)
![]() |
View Item |