This repository implements the main experiments of our paper, Distilling Many-Shot In-Context Learning into a Cheat Sheet (EMNLP 2025 Findings). We introduce cheat-sheet ICL, which distills the information from many-shot ICL into a concise textual summary (cheat sheet) used as the context at inference time. This approach achieves comparable or better performance than many-shot ICL with far fewer tokens in challenging reasoning tasks, and matches retrieval-based ICL without requiring test-time retrieval. The code is intended solely for reproducing the experiments.
We tested the code in the following environment.
- OS: Ubuntu 22.04.5 LTS
- Python: 3.12
We used uv to manage Python and its packages.
git clone https://github.com/CyberAgentAILab/cheat-sheet-icl.git
cd cheat-sheet-icl
# (Optional) If this version of Python is not installed
uv python pin 3.12
uv python install
# Create a virtual environment and install dependencies
uv venv --python 3.12 --python-preference managed
uv pip install --no-cache --force-reinstall -r requirements.txt
The code assumes that the API keys are stored in environment variables.
Fill in the .envrc
file and load it.
GEMINI_API_KEY
is optional; it is only required when running the cheat-sheet transferability experiments.
cp .envrc_example .envrc
`Fill in the .envrc file with your API keys`
# If you are using direnv
direnv allow
# Otherwise
source .envrc
Run the following command to download the rationale-augmented BBH data.
The cheat sheets used in our experiments can also be found in data/cheat_prompt
.
The files in data/cheat_prompt
and data/metaprompt
are adapted from BIG-Bench-Hard.
cd data
curl -L -O https://storage.googleapis.com/ailab-public/cheat-sheet-icl/data/aug_data_bbh.zip
unzip -q aug_data_bbh.zip
rm aug_data_bbh.zip
(Optional) If you want to run rationale augmentation and cheat-sheet creation from scratch, download BBH data and place them under data
.
The other datasets can be downloaded automatically with the Hugging Face Datasets library, but you will need to request access to GPQA using your Hugging Face account and then run uv run huggingface-cli login
.
Then run script/preproc.sh
for BBH and script/preproc_acad.sh
for the academic benchmarks.
cd data
curl -L -O https://github.com/suzgunmirac/BIG-Bench-Hard/archive/refs/heads/main.zip
unzip -q main.zip
rm main.zip
Below are sample commands for running the experiments. See script
for more examples.
Note
- We used Azure OpenAI API for GPT-4.1.
- We used the specific version
gpt-4.1-2025-04-14
, and the code assumes that the deployment name is the same.- If you want to use a different deployment name, modify
model_libs
insrc/args_utils.py
.
- If you want to use a different deployment name, modify
- The Azure OpenAI content filters were disabled upon our request.
META_MODEL="gpt-4.1-2025-04-14"
MODEL="gpt-4.1-2025-04-14"
TASK="bbh_salient_translation_error_detection"
REASON="gen"
SEED=1000
# Few-shot ICL (8-shot)
uv run src/run_main_api.py \
--task ${TASK} \
--meta_model ${META_MODEL} \
--model ${MODEL} \
--reason_type ${REASON} \
--prompt_type shot \
--shot 8 \
--seed ${SEED}
# Many-shot ICL (n-shot)
uv run src/run_main_api.py \
--task ${TASK} \
--meta_model ${META_MODEL} \
--model ${MODEL} \
--reason_type ${REASON} \
--prompt_type shot \
--shot 0 \
--seed ${SEED}
# Cheat-sheet ICL
uv run src/run_main_api.py \
--task ${TASK} \
--meta_model ${META_MODEL} \
--model ${MODEL} \
--reason_type ${REASON} \
--prompt_type cheat \
--shot 0 \
--seed ${SEED}
@inproceedings{honda-etal-2025-distilling,
title = "Distilling Many-Shot In-Context Learning into a Cheat Sheet",
author = "Honda, Ukyo and Murakami, Soichiro and Zhang, Peinan",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
year = "2025",
}