include links to papers

jjallaire · jjallaire · commit d4dc127db93d · 2024-11-13T09:40:35.000Z
diff --git a/README.md b/README.md
@@ -107,7 +107,7 @@ Inspect supports many model providers including OpenAI, Anthropic, Google, Mistr
 
 - ### [Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models](src/inspect_evals/cybench)
   40 professional-level Capture the Flag (CTF) tasks from 4 distinct CTF competitions, chosen to be recent, meaningful, and spanning a wide range of difficulties. 
- <sub><sup>Contributed by: [@sinman-aisi](https://github.com/sinman-aisi), [@sam-deverett-dsit](https://github.com/sam-deverett-dsit), [@kola-aisi](https://github.com/kola-aisi)</sub></sup>
+ <sub><sup>Contributed by: [@sinman-aisi](https://github.com/sinman-aisi), [@sam-deverett-dsit](https://github.com/sam-deverett-dsit), [@kola-aisi](https://github.com/kola-aisi), [@pgiav](https://github.com/pgiav)</sub></sup>
    ```
    inspect eval inspect_evals/cybench
    ```
diff --git a/docs/_scripts/prerender.py b/docs/_scripts/prerender.py
@@ -136,6 +136,8 @@ def create_front_matter(listing: dict[str, Any], sort_index: int) -> list[str]:
             readme_out.append(f"  - \"{to_author_link(author)}\"")
     readme_out.append(f"code: {listing['path']}")
     readme_out.append(f"code-url: https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/{listing['path']}")
+    if "arxiv" in listing:
+        readme_out.append(f"arxiv: {listing['arxiv']}")
     readme_out.append(f"group: {listing['group']}")
     readme_out.append(f"order: {sort_index}")
     readme_out.append("---")
diff --git a/docs/_templates/title-metadata.html b/docs/_templates/title-metadata.html
@@ -56,6 +56,15 @@
   </div>
   $endif$
 
+  $if(arxiv)$
+  <div>
+    <div class="quarto-title-meta-heading">Paper</div>
+    <div class="quarto-title-meta-contents">
+      <p class="doi"><a href="$arxiv$">$arxiv$</a></p>
+    </div>
+  </div>
+  $endif$
+
   $if(date-modified)$
   <div>
     <div class="quarto-title-meta-heading">$labels.modified$</div>
diff --git a/src/inspect_evals/cybench/README.md b/src/inspect_evals/cybench/README.md
@@ -3,7 +3,7 @@
 [Cybench](https://cybench.github.io/) is a benchmark for evaluating models' cybersecurity capabilities. It contains 40 capture the flag (CTF) tasks from 4 CTF competitions that cover 6 domains: cryptography, web security, reverse engineering, forensics, exploitation (pwn), and miscellaneous.
 
 <!-- Contributors: Automatically Generated -->
-Contributed by [@sinman-aisi](https://github.com/sinman-aisi), [@sam-deverett-dsit](https://github.com/sam-deverett-dsit), [@kola-aisi](https://github.com/kola-aisi)
+Contributed by [@sinman-aisi](https://github.com/sinman-aisi), [@sam-deverett-dsit](https://github.com/sam-deverett-dsit), [@kola-aisi](https://github.com/kola-aisi), [@pgiav](https://github.com/pgiav)
 <!-- /Contributors: Automatically Generated -->
 
 <!-- Usage: Automatically Generated -->
@@ -62,7 +62,7 @@ You can specify a certain variant to run. For example, to run the solution varia
 inspect eval inspect_evals/cybench -T variants=solution
 ```
 
-You can also create and specify an agent to use so long as it's in the form of an Inspect [solver](https://inspect.ai-safety-institute.org.uk/solvers.html). See `default_agent` in [task.py](./task.py) for an example.
+You can also create and specify an agent to use so long as it's in the form of an Inspect [solver](https://inspect.ai-safety-institute.org.uk/solvers.html). See `default_agent` in [cybench.py](https://github.com/UKGovernmentBEIS/inspect_evals/blob/main/src/inspect_evals/cybench/cybench.py) for an example.
 
 There are two task parameters that define limits on the evaluation:
 - `max_attempts` defines the number of incorrect submissions to allow before ending the challenges (defaults to 3).
diff --git a/tools/listing.yaml b/tools/listing.yaml
@@ -61,7 +61,8 @@
     40 professional-level Capture the Flag (CTF) tasks from 4 distinct CTF competitions, chosen to be recent, meaningful, and spanning a wide range of difficulties. 
   path: src/inspect_evals/cybench
   group: Cybersecurity
-  contributors: ["sinman-aisi", "sam-deverett-dsit", "kola-aisi"]
+  contributors: ["sinman-aisi", "sam-deverett-dsit", "kola-aisi", "pgiav"]
+  arxiv: https://arxiv.org/abs/2408.08926
   tasks: ["cybench"]
   tags: ["Agent"]