Human-Understandable Explanation for Software Vulnerability Prediction

@article{NGUYEN2025112455,
title = {Human-understandable explanation for software vulnerability prediction},
journal = {Journal of Systems and Software},
pages = {112455},
year = {2025},
issn = {0164-1212},
doi = {https://doi.org/10.1016/j.jss.2025.112455},
url = {https://www.sciencedirect.com/science/article/pii/S0164121225001232},
author = {Hong Quy Nguyen and Thong Hoang and Hoa Khanh Dam and Guoxin Su and Zhenchang Xing and Qinghua Lu and Jiamou Sun},
keywords = {Vulnerability prediction, Explainable AI, Text generation, Key aspects},
abstract = {Recent advances in deep learning have significantly improved the performance of software vulnerability prediction (SVP). To enhance trustworthiness, the SVP highlights predicted lines of code (LoC) that may be vulnerable. However, providing LoC alone is often insufficient for software practitioners, as it lacks detailed information about the nature of the vulnerability. This paper introduces a novel framework that is built on SVP by offering additional explanatory information based on the suggested LoC. Similar to security reports, our framework comprehensively explains the vulnerability aspects, such as Root Cause, Impact, Attack Vector, and Vulnerability Type. The proposed framework is powered by transformer architectures. Specifically, we leverage pre-trained language models for code to fine-tune on two practical datasets: BigVul and Vulnerability Key Aspect, ensuring our framework’s applicability to real-world scenarios. Experiments using the ROUGE and BLEU scores as evaluation metrics show that our framework achieves better performance with CodeT5+, statistically outperforming a baseline study in generating key vulnerability aspects. Additionally, we conducted a small-scale user study with experienced software practitioners to assess the effectiveness of the framework. The results show that 72% of the participants found our framework helpful in accepting the SVP results, and 68% rated the additional explanations as moderately to extremely useful. Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.}
}

With Docker

./run_docker.sh

Without Docker

conda env create -f binder/environment.yml
conda activate vul-intext-reason

Install other dependencies in OS

Ubuntu clang-format version 14.0.0-1ubuntu1
Graphviz

sudo apt install clang-format
sudo apt install graphviz

Update Environment after adding

conda env update --file binder/environment.yml --prune

Export Environment

conda env export --from-history -f binder/environment.yml

Encounter issue with GLIBCXX not found

Firstly, find / -name "libstdc++.so*" and create a symbolic link properly

Pre-processing data

Merge BigVul and VKA: explore_data61.ipynb
Apply LineVul: apply_linevul_parse_data.ipynb

Final data can be download at https://drive.google.com/file/d/1ZxGaSg4L3lGq94SYgngjR_CnZtNpEvtc/view?usp=sharing , unzip and rename it to .aspect_bigvul_new

Run CodeT5+ k=10, k=5

./run_t5p_new.sh

Run CodeBert-based

./run_bert_seq2seq_new.sh

Run CodeT5+ with k=(5 10 15 20 25 30 40 50 60 70 80 90)

./run_t5p_percentage.sh

Run RAG and Few-shot Learning

run elasticsearch

docker run -d --name elasticsearch \
    -p 0.0.0.0:9200:9200 -p 9300:9300 \
    -e "discovery.type=single-node" \
    -e "xpack.security.enabled=false" \
    -v /data/elasticsearchData/:/usr/share/elasticsearch/data \
    docker.elastic.co/elasticsearch/elasticsearch:8.15.0

run RAG(CodeT5+) or RAG(BM25): use notebooks rag_baseline.ipynb and rag_baseline_BM25.ipynb
for Few-shot learning

    ./run_3shot.sh

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
binder		binder
code2nl		code2nl
icse		icse
rag		rag
.condarc		.condarc
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
apply_linevul_parse_data.ipynb		apply_linevul_parse_data.ipynb
commons.py		commons.py
explore_data61.ipynb		explore_data61.ipynb
install.sh		install.sh
linevul_extra.py		linevul_extra.py
linevul_helpers.py		linevul_helpers.py
linevul_model.py		linevul_model.py
measure_performance.ipynb		measure_performance.ipynb
project_dataset.py		project_dataset.py
rag_and_llm.py		rag_and_llm.py
rag_baseline.ipynb		rag_baseline.ipynb
rag_baseline_BM25.ipynb		rag_baseline_BM25.ipynb
run_3shot.sh		run_3shot.sh
run_bert_seq2seq.sh		run_bert_seq2seq.sh
run_bert_seq2seq_new.sh		run_bert_seq2seq_new.sh
run_docker.sh		run_docker.sh
run_icse.sh		run_icse.sh
run_t5p.sh		run_t5p.sh
run_t5p_new.sh		run_t5p_new.sh
run_t5p_percentage.sh		run_t5p_percentage.sh
t5p_seq2seq.py		t5p_seq2seq.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Human-Understandable Explanation for Software Vulnerability Prediction

With Docker

Without Docker

Update Environment after adding

Export Environment

Encounter issue with GLIBCXX not found

Pre-processing data

Run CodeT5+ k=10, k=5

Run CodeBert-based

Run CodeT5+ with k=(5 10 15 20 25 30 40 50 60 70 80 90)

Run RAG and Few-shot Learning

About

Uh oh!

Releases

Packages

Languages

quy-ng/human-xai-software-vulnerability-prediction

Folders and files

Latest commit

History

Repository files navigation

Human-Understandable Explanation for Software Vulnerability Prediction

With Docker

Without Docker

Update Environment after adding

Export Environment

Encounter issue with GLIBCXX not found

Pre-processing data

Run CodeT5+ k=10, k=5

Run CodeBert-based

Run CodeT5+ with k=(5 10 15 20 25 30 40 50 60 70 80 90)

Run RAG and Few-shot Learning

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages