Training Software Agents to Find Vulnerabilities with CTF-Dojo

CTF-Dojo is a large-scale executable runtime for training LLM agents with verifiable feedback. It provides 658 fully functional CTF-style challenges, each containerized with Docker for guaranteed reproducibility, and is automatically assembled via CTF-Forge—an end-to-end pipeline that converts public artifacts into ready-to-run environments in minutes. Training on just 486 high-quality, execution-verified trajectories yields up to 11.6% absolute gains over strong baselines across InterCode-CTF, NYU CTF Bench, and Cybench; our best 32B model reaches 31.9% Pass@1, rivaling frontier systems. These results show execution-grounded signals are pivotal for advancing powerful ML agents without relying on proprietary infrastructure.

Overview

Quick Start

Clone the Pwn.College's CTF Archive

git clone https://github.com/pwncollege/ctf-archive.git

Run CTF-Forge on the CTF Archive to create CTF-Dojo

python ctf_forge.py
# Arguments (uncomment and set as needed):
# --template_path <dir>     Path to template copied into `ctf-archive` (default: ctf-archive-template)
# --max_tasks <N>           Limit number of tasks to process (testing)
# --filter_ctf <name>       Filter by CTF name (case-insensitive substring)
# --filter_category <tag>   Filter by category/tag (case-insensitive substring)
# --no_docker_compose       Skip generating docker-compose.yml
# --verbose                 Enable detailed logs
# --model <id>              Model ID for generation (default: deepseek-v3-0324)
# --max_retries <N>         Max retries for LLM calls (default: 10)
# --workers <N>             Parallel workers (default: 32; set 1 for sequential)
# --skip_existing           Skip tasks that already have challenge.json (default unless --overwrite)
# --overwrite               Overwrite existing files and recopy template
# --demo                    Process a single task with verbose output (forces --workers 1)

Collect writeups (external dataset)

Download writeups following the dataset structure described here: Amazon Science CTF-Dojo data collection.
Ensure you have a JSONL file of writeups (e.g., writeups.jsonl).

Create the metadata for CTF-Dojo challenges

python generate_metadata.py
# Arguments (uncomment and set as needed):
# --folder <dir>             Base directory to search for CTF challenges (default: ctf-archive)
# --require-sha256           Only include tasks that have a SHA256 file (flag.sha256, .flag.sha256, or flag.sha256.txt)
# --skip-sha256              Skip tasks that have a SHA256 file (flag.sha256, .flag.sha256, or flag.sha256.txt)
# --skip-flagcheck           Skip tasks that have any files containing 'flagcheck' in the name
# --require-compose          Only include tasks that have compose set to true in challenge.json

Map writeups to CTF-Dojo challenges

python find_writeups.py \
  --jsonl-file path/to/writeups.jsonl \
  --json-file ctf_archive.json \
  --output-file task_writeup_mapping.json \
  --min-threshold 0.9 \
  --workers 32 -v

Collect trajectories from CTF-Dojo challenges

Run EnIGMA+ to collect trajectories from CTF-Dojo challenges.

Citation

If you use this benchmark suite in your research, please cite:

@article{zhuo2025training,
  title={Training Language Model Agents to Find Vulnerabilities with CTF-Dojo},
  author={Zhuo, Terry Yue and Wang, Dingmin and Ding, Hantian and Kumar, Varun and Wang, Zijian},
  journal={arXiv preprint arXiv:2508.18370},
  year={2025}
}

@article{zhuo2025cyber,
  title={CTF-Dojo: Training Cybersecurity Agents without Runtime},
  author={Zhuo, Terry Yue and Wang, Dingmin and Ding, Hantian and Kumar, Varun and Wang, Zijian},
  journal={arXiv preprint arXiv:2508.00910},
  year={2025},
}

License

This project is licensed under the CC-BY-NC-4.0 - see the LICENSE file for details.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to contribute to this project.

Support

If you need help or have questions, please check our SUPPORT.md guide or open an issue on GitHub.

Code of Conduct

This project adheres to the Contributor Covenant Code of Conduct. Please read CODE_OF_CONDUCT.md for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Training Software Agents to Find Vulnerabilities with CTF-Dojo

Overview

Quick Start

Citation

License

Contributing

Support

Code of Conduct

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
asset		asset
forge		forge
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
ctf_archive.json		ctf_archive.json
ctf_forge.py		ctf_forge.py
find_writeups.py		find_writeups.py
generate_metadata.py		generate_metadata.py

License

amazon-science/CTF-Dojo

Folders and files

Latest commit

History

Repository files navigation

Training Software Agents to Find Vulnerabilities with CTF-Dojo

Overview

Quick Start

Citation

License

Contributing

Support

Code of Conduct

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages