Skip to content

Commit e36d607

Browse files
authored
Merge pull request #25 from haluptzok/main
ICLR 2023 code and data for Language Models can teach themselves to code better
2 parents d79e83b + 6e75a25 commit e36d607

20 files changed

+2891
-0
lines changed

ICLR2023/README.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
This is the code and data for the paper: Language Models can teach themselves to code better
2+
https://arxiv.org/abs/2207.14502
3+
4+
LICENSE
5+
MIT License - as already specified in the ../LICENSE file of PythonProgrammingPuzzles repo
6+
7+
GPU USAGE
8+
GPU usage was large , especially for the 2.7B sized model which is ~20X the 125M.
9+
Data generation takes the most GPU usage and took about 2500 GPU hours for 2.7B (on v100)
10+
Finetuning on the 1M generated data took about 40 GPU hours for 2.7B (on v100) per epoch of finetuning - 10 epochs = 400 GPU hours
11+
Solving the 228 problem testset with 100 attempts using the finetuned 2.7B model took about 4 hours (on v100)
12+
We mostly used v100, but we used whatever was available, so T4 and A100 sometimes if they were free.
13+
Tried everything at 125M first - debug there and make it work perfect - then roll out the 1.3 and 2.7 jobs
14+
15+
DATASETS
16+
In data directory are the datasets used. We feel the most interesting dataset is data/Codex_PAPER_1M_iter_0.txt
17+
which is generated by Codex and gave the best results when finetuned on. All the datasets are part of our public release.
18+
19+
SETUP
20+
src/requirements.txt is what we install on our cluster machines - the cluster comes with NVidia drivers and matching pytorch
21+
./requirements.txt is what I personally have installed on my local machine and tested this runs - but it has lots of stuff you don't need
22+
So try src/requirements.txt only - and if that doesn't work - then /requirements.txt has all versions of everything installed on my machine
23+
Getting a deepspeed 0.6.1 matching a pytorch matching a nvidia driver install was tricky for me on some machines, torch 1.10 and 1.11 both work
24+
25+
GENERATING/FINETUNING -> run "cd src, ./babysit.sh GPU_INDEX_TO_USE" -> GPU_INDEX_TO_USE=0 typically
26+
In src/babysit.sh is the script that generates data, and finetunes on that data in a loop, finetuning the GPT-Neo 125M/1.3B/2.7B models
27+
In src/babysit.sh TEST_LOCAL=1 controls running locally on machine's GPUs which is great for fast testing, or =0 is launching on the cluster which is slow but has lots of GPUs
28+
Realistically you have to train on a cluster - data generation takes a long time so having lots of machines all generating data is the feasible approach.
29+
But given enough time - this will run locally on 1 GPU. 1 year for 2.7B, or 2 weeks for 125M.
30+
We found generating 75k samples after deduping worked for iteration_0 - finetune on that data.
31+
Then using that fine_tuned model in iter_1 generating data happens more quickly - the finetuned model solves many more problems
32+
Repeating that process works well.
33+
On 125M we looked at just training on only 125M generated data from iter_0 versus iter_1 versus iter_2 - generating 600K for each iteration.
34+
It seemed finetuning on iter_2 data was best on the testset 26.9/228 solved vs iter_1=26.1/228 vs iter_0=22.2/228
35+
With 1M samples from 125M generated data sampled across all the iterations 0,1,2 we got 26.75/228
36+
We understand why it's faster to generate iter_2 data on a finetuned model - it solves more problems.
37+
But why are the generated puzzles&solutions better for training the model on?
38+
We will explore that more in the future - and try iterating a lot farther than 3 iterations - although our preliminary experiments on 125M show it tops out at 3 iterations
39+
40+
FINETUNING ONLY -> run "cd src, ./fine_tune1.sh GPU_INDEX_TO_USE" -> GPU_INDEX_TO_USE=0 typically
41+
# ./fine_tune1.sh GPU MODEL_TO_TRAIN EXPERIMENT_NAME_DIRECTORY TRAIN_DATA EPOCHS
42+
This allows the repeated finetuning on a specific dataset.
43+
Use this to do a temperature grid search, or try different variations of parameters on a specific dataset.
44+
45+
Detailed instructions for reproducing experiments:
46+
# Generating Codex data
47+
python gen.py -n=32 -max_tokens=4096 -model_path=openai/code-davinci-002 -model_path_solve=openai/code-cushman-001 -out=../data/codex/iter_0 -seed=2022
48+
49+
# Measuring codex accuracy via API calls
50+
./solve2.sh
51+
python solve.py -prefix=../data/train_prefix.txt -attempts=1 -model_path=openai/code-cushman-001 -gpu=0 -fixed_temp=0.8 -out=../data/codex -puzzles=../data/test_228.json -seed=2022 -batch_size=64
52+
53+
# Producing verified Codex_PAPER_1M_iter_0.txt from the puzzle/solution old style data generated by Codex
54+
python preprocess.py -path=../data/codex/old_verified -f_name=Codex_PAPER_1M_iter_0.txt -max_sols_per_puzzle=8 -old_style_json=True -max_examples=1000000 -include_failures=False -seed=2022
55+
cp ../data/codex/old/Codex_PAPER_1M_iter_0.txt ../data/Codex_PAPER_1M_iter_0.txt
56+
57+
# Producing unverified Codex_unverified_PAPER_1M_iter_0.txt from the puzzle/solution old style data generated by Codex
58+
python preprocess.py -path=../data/codex/old_unverified -f_name=Codex_unverified_PAPER_1M_iter_0.txt -max_sols_per_puzzle=8 -old_style_json=True -max_examples=1000000 -include_failures=True -seed=2022
59+
cp ../data/codex/old_unverified/Codex_unverified_PAPER_1M_iter_0.txt ../data/Codex_unverified_PAPER_1M_iter_0.txt
60+
61+
# Producing 125M_PAPER_25K_iter_0.txt from the puzzle/solution new style data
62+
python preprocess.py ../data/125M_PAPER/iter_0 125M_PAPER_25K_iter_0.txt 8 False 25000 False -seed=2022
63+
cp ../data/125M_PAPER/iter_0/125M_PAPER_25K_iter_0.txt ../data/125M_PAPER_25K_iter_0.txt
64+
65+
# Producing 125M_PAPER_1M_iter_1.txt from the puzzle/solution new style data
66+
python preprocess.py ../data/125M_PAPER/iter_1 125M_PAPER_1M_iter_1.txt 8 False 1000000 False -seed=2022
67+
cp ../data/125M_PAPER/iter_1/125M_PAPER_1M_iter_1.txt ../data/125M_PAPER_1M_iter_1.txt
68+
69+
# Producing 125M_PAPER_1M_iter_2.txt from the puzzle/solution new style data13B
70+
python preprocess.py ../data/125M_PAPER/iter_2 125M_PAPER_1M_iter_2.txt 8 False 1000000 False -seed=2022
71+
cp ../data/125M_PAPER/iter_2/125M_PAPER_1M_iter_2.txt ../data/125M_PAPER_1M_iter_2.txt
72+
73+
# Producing 13B_PAPER_25K_iter_0.txt from the puzzle/solution new style data
74+
python preprocess.py ../data/13B_PAPER/iter_0 13B_PAPER_25K_iter_0.txt 8 False 25000 False -seed=2022
75+
cp ../data/13B_PAPER/iter_0/13B_PAPER_25K_iter_0.txt ../data/13B_PAPER_25K_iter_0.txt
76+
77+
# Producing 13B_PAPER_1M_iter_1.txt from the puzzle/solution new style data
78+
python preprocess.py ../data/13B_PAPER/iter_1 13B_PAPER_1M_iter_1.txt 8 False 1000000 False -seed=2022
79+
cp ../data/13B_PAPER/iter_1/13B_PAPER_1M_iter_1.txt ../data/13B_PAPER_1M_iter_1.txt
80+
81+
# Producing 13B_PAPER_1M_iter_2.txt from the puzzle/solution new style data
82+
python preprocess.py ../data/13B_PAPER/iter_2 13B_PAPER_1M_iter_2.txt 8 False 1000000 False -seed=2022
83+
cp ../data/13B_PAPER/iter_2/13B_PAPER_1M_iter_2.txt ../data/13B_PAPER_1M_iter_2.txt
84+
85+
# Producing 27B_PAPER_25K_iter_0.txt from the puzzle/solution new style data
86+
python preprocess.py ../data/27B_PAPER/iter_0 27B_PAPER_25K_iter_0.txt 8 False 25000 False -seed=2022
87+
cp ../data/27B_PAPER/iter_0/27B_PAPER_25K_iter_0.txt ../data/27B_PAPER_25K_iter_0.txt
88+
89+
# Producing 27B_PAPER_1M_iter_1.txt from the puzzle/solution new style data
90+
python preprocess.py ../data/27B_PAPER/iter_1 27B_PAPER_1M_iter_1.txt 8 False 1000000 False -seed=2022
91+
cp ../data/27B_PAPER/iter_1/27B_PAPER_1M_iter_1.txt ../data/27B_PAPER_1M_iter_1.txt
92+
93+
# Producing 27B_PAPER_1M_iter_2.txt from the puzzle/solution new style data
94+
python preprocess.py ../data/27B_PAPER/iter_2 27B_PAPER_1M_iter_2.txt 8 False 1000000 False -seed=2022
95+
cp ../data/27B_PAPER/iter_2/27B_PAPER_1M_iter_2.txt ../data/27B_PAPER_1M_iter_2.txt
96+
97+
# Data files produced by babysit.sh - generating data from gpt-neo-* and Codex
98+
# At the time of experiments running, Codex wasn't finetunable, so only iteration 0 data was available
99+
Codex_PAPER_1M_iter_0.txt
100+
125M_PAPER_25K_iter_0.txt
101+
13B_PAPER_25K_iter_0.txt
102+
27B_PAPER_25K_iter_0.txt
103+
125M_PAPER_1M_iter_1.txt
104+
13B_PAPER_1M_iter_1.txt
105+
27B_PAPER_1M_iter_1.txt
106+
125M_PAPER_1M_iter_2.txt
107+
13B_PAPER_1M_iter_2.txt
108+
27B_PAPER_1M_iter_2.txt
109+
110+
# Figure 5 - 3 diagrams - showing the 3 GPT models trained on verified codex vs unverified codex vs baseline
111+
# 5a GPT-NEO 125M
112+
./fine_tune1.sh 0 125M ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt
113+
./fine_tune1.sh 0 125M ft1_Codex_unverified_PAPER_1M_iter_0 Codex_unverified_PAPER_1M_iter_0.txt
114+
./solve1.sh 0 125M 10 228
115+
# 5b GPT-NEO 13B
116+
./fine_tune1.sh 0 13B ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt
117+
./fine_tune1.sh 0 13B ft1_Codex_unverified_PAPER_1M_iter_0 Codex_unverified_PAPER_1M_iter_0.txt
118+
./solve1.sh 0 13B 10 228 5
119+
# 5c GPT-NEO 27B
120+
./fine_tune1.sh 0 27B ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt
121+
./fine_tune1.sh 0 27B ft1_Codex_unverified_PAPER_1M_iter_0 Codex_unverified_PAPER_1M_iter_0.txt
122+
./solve1.sh 0 13B 10 228 5
123+
124+
# Figure 6 - 3 diagrams - showing test228 Pass@ for the 3 GPT models trained on data from 4 generators (codex and 3 GPT-Neo) and baseline
125+
# 6a - GPT-NEO 125M trained on 4 different datasets and baseline
126+
# ./fine_tune1.sh 0 125M ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt (dupe of 5a)
127+
./fine_tune1.sh 0 125M ft1_125M_PAPER_1M_iter_2 125M_PAPER_1M_iter_2.txt
128+
./fine_tune1.sh 0 125M ft1_13B_PAPER_1M_iter_2 13B_PAPER_1M_iter_2.txt
129+
./fine_tune1.sh 0 125M ft1_27B_PAPER_1M_iter_2 27B_PAPER_1M_iter_2.txt
130+
131+
# 6b - GPT-NEO 13B trained on 4 different datasets and baseline
132+
# ./fine_tune1.sh 0 13B ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt (dupe of 5b)
133+
./fine_tune1.sh 0 13B ft1_125M_PAPER_1M_iter_2 125M_PAPER_1M_iter_2.txt
134+
./fine_tune1.sh 0 13B ft1_13B_PAPER_1M_iter_2 13B_PAPER_1M_iter_2.txt
135+
./fine_tune1.sh 0 13B ft1_27B_PAPER_1M_iter_2 27B_PAPER_1M_iter_2.txt
136+
137+
# 6c - GPT-NEO 27B trained on 4 different datasets and baseline
138+
# ./fine_tune1.sh 0 27B ft1_Codex_PAPER_1M_iter_0 Codex_PAPER_1M_iter_0.txt (dupe of 5c)
139+
./fine_tune1.sh 0 27B ft1_125M_PAPER_1M_iter_2 125M_PAPER_1M_iter_2.txt
140+
./fine_tune1.sh 0 27B ft1_13B_PAPER_1M_iter_2 13B_PAPER_1M_iter_2.txt
141+
./fine_tune1.sh 0 27B ft1_27B_PAPER_1M_iter_2 27B_PAPER_1M_iter_2.txt
142+
143+
# Launch on torch2020 - edit solve.yaml for correct parameters of model and epoch
144+
./tst_human_eval_base.sh 0 125M 1024
145+
./tst_human_eval_ft1.sh 0 125M 1024
146+
./tst_human_eval_ft5.sh 0 125M 1024
147+
./tst_human_eval_ft10.sh 0 125M 1024
27.2 MB
Binary file not shown.
25.6 MB
Binary file not shown.
27 MB
Binary file not shown.
39.5 MB
Binary file not shown.
47.9 MB
Binary file not shown.

ICLR2023/requirements.txt

Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
adal==1.2.7
2+
aiohttp==3.8.1
3+
aiosignal==1.2.0
4+
amlt==8.0.9
5+
applicationinsights==0.11.10
6+
asn1crypto==0.24.0
7+
astor==0.8.1
8+
async-timeout==4.0.1
9+
attrs==17.4.0
10+
Automat==0.6.0
11+
azure-common==1.1.27
12+
azure-core==1.17.0
13+
azure-data-tables==12.0.0b6
14+
azure-graphrbac==0.61.1
15+
azure-identity==1.4.1
16+
azure-mgmt-authorization==0.61.0
17+
azure-mgmt-containerregistry==2.8.0
18+
azure-mgmt-keyvault==2.2.0
19+
azure-mgmt-resource==13.0.0
20+
azure-mgmt-storage==11.2.0
21+
azure-storage-blob==2.1.0
22+
azure-storage-common==2.1.0
23+
azure-storage-file==2.1.0
24+
azureml-automl-core==1.26.0
25+
azureml-contrib-k8s==0.1.16
26+
azureml-contrib-pipeline-steps==1.26.0
27+
azureml-core==1.26.0
28+
azureml-dataprep==2.13.2
29+
azureml-dataprep-native==32.0.0
30+
azureml-dataprep-rslex==1.11.2
31+
azureml-dataset-runtime==1.26.0
32+
azureml-k8s-mt==1.0.4
33+
azureml-pipeline-core==1.26.0
34+
azureml-pipeline-steps==1.26.0
35+
azureml-telemetry==1.26.0
36+
azureml-train-automl-client==1.26.0
37+
azureml-train-core==1.26.0
38+
azureml-train-restclients-hyperdrive==1.26.0
39+
backcall==0.2.0
40+
backports.tempfile==1.0
41+
backports.weakref==1.0.post1
42+
beautifulsoup4==4.9.3
43+
bitstring==3.1.9
44+
black==21.8b0
45+
blinker==1.4
46+
blis==0.7.4
47+
blobxfer==1.10.0
48+
cachetools==4.2.2
49+
catalogue==2.0.6
50+
certifi==2018.1.18
51+
cffi==1.14.6
52+
chardet==3.0.4
53+
charset-normalizer==2.0.7
54+
click==7.1.2
55+
click-completion @ git+https://github.com/temporaer/click-completion.git@41b21868cac0781d25b37da624bae2fd1f36be88
56+
click-option-group==0.5.3
57+
click-plugins==1.1.1
58+
cloud-init==20.2
59+
cloudpickle==1.6.0
60+
colorama==0.3.7
61+
colorlog==6.4.1
62+
command-not-found==0.3
63+
configobj==5.0.6
64+
configparser==5.0.2
65+
constantly==15.1.0
66+
contextlib2==21.6.0
67+
cryptography==3.4.8
68+
cycler==0.10.0
69+
cymem==2.0.5
70+
datasets==1.15.1
71+
debugpy==1.4.3
72+
decorator==5.0.9
73+
deepspeed==0.5.1
74+
dill==0.3.4
75+
distro==1.6.0
76+
distro-info===0.18ubuntu0.18.04.1
77+
docker==5.0.1
78+
docker-pycreds==0.4.0
79+
dotnetcore2==2.1.21
80+
ecdsa==0.17.0
81+
entrypoints==0.3
82+
et-xmlfile==1.1.0
83+
fail2ban==0.10.2
84+
fastai==2.5.2
85+
fastcore==1.3.26
86+
fastdownload==0.0.5
87+
fastprogress==1.0.0
88+
filelock==3.0.12
89+
Flask==2.0.1
90+
Flask-Cors==3.0.10
91+
Flask-Executor==0.9.4
92+
Flask-FontAwesome==0.1.5
93+
frozenlist==1.2.0
94+
fsspec==2021.11.0
95+
gitdb==4.0.7
96+
GitPython==3.1.18
97+
httplib2==0.9.2
98+
huggingface-hub==0.1.2
99+
humanize==3.11.0
100+
hyperlink==17.3.1
101+
idna==2.6
102+
incremental==16.10.1
103+
ipdb==0.13.9
104+
ipykernel==6.4.1
105+
ipython==7.27.0
106+
ipython-genutils==0.2.0
107+
isodate==0.6.0
108+
itsdangerous==2.0.1
109+
jedi==0.18.0
110+
Jinja2==3.0.1
111+
jmespath==0.10.0
112+
joblib==1.0.1
113+
jsonpatch==1.16
114+
jsonpickle==2.0.0
115+
jsonpointer==1.10
116+
jsonschema==2.6.0
117+
jupyter-client==7.0.5
118+
jupyter-core==4.8.1
119+
keyring==10.6.0
120+
keyrings.alt==3.0
121+
kiwisolver==1.3.2
122+
language-selector==0.1
123+
libtmux==0.10.1
124+
Mako==1.1.5
125+
MarkupSafe==2.0.1
126+
marshmallow==3.10.0
127+
matplotlib==3.4.3
128+
matplotlib-inline==0.1.3
129+
mlb-core==0.0.4
130+
msal==1.14.0
131+
msal-extensions==0.2.2
132+
msrest==0.6.19
133+
msrestazure==0.6.4
134+
multidict==5.2.0
135+
multiprocess==0.70.12.2
136+
murmurhash==1.0.5
137+
mypy-extensions==0.4.3
138+
ndg-httpsclient==0.5.1
139+
nest-asyncio==1.5.1
140+
netifaces==0.10.4
141+
ninja==1.10.2
142+
ntlm-auth==1.5.0
143+
numpy==1.21.2
144+
oauthlib==3.1.1
145+
openai==0.13.0
146+
openpyxl==3.0.9
147+
orderedset==2.0.3
148+
packaging==21.0
149+
PAM==0.4.2
150+
pandas==1.3.2
151+
pandas-stubs==1.2.0.45
152+
parso==0.8.2
153+
passpy==1.0.2
154+
pathspec==0.9.0
155+
pathtools==0.1.2
156+
pathy==0.6.0
157+
Pebble==4.6.3
158+
petname==2.6
159+
pexpect==4.8.0
160+
pickleshare==0.7.5
161+
Pillow==8.3.2
162+
platformdirs==2.3.0
163+
portalocker==1.7.1
164+
preshed==3.0.5
165+
promise==2.3
166+
prompt-toolkit==3.0.20
167+
protobuf==3.17.3
168+
psb2==1.0.0
169+
psutil==5.8.0
170+
ptyprocess==0.7.0
171+
pyarrow==1.0.1
172+
pyasn1==0.4.2
173+
pyasn1-modules==0.2.1
174+
pycparser==2.20
175+
pycrypto==2.6.1
176+
pydantic==1.8.2
177+
Pygments==2.10.0
178+
PyGObject==3.26.1
179+
PyJWT==1.5.3
180+
pyOpenSSL==17.5.0
181+
pyparsing==2.4.7
182+
pyperclip==1.8.2
183+
pyserial==3.4
184+
python-apt==1.6.5+ubuntu0.3
185+
python-dateutil==2.8.2
186+
python-debian==0.1.32
187+
python-gnupg==0.4.7
188+
pytz==2021.1
189+
pyxdg==0.25
190+
PyYAML==5.4.1
191+
pyzmq==22.3.0
192+
regex==2021.8.28
193+
requests==2.25.1
194+
requests-ntlm==1.1.0
195+
requests-oauthlib==1.3.0
196+
requests-unixsocket==0.1.5
197+
ruamel.yaml==0.17.16
198+
ruamel.yaml.clib==0.2.6
199+
sacremoses==0.0.45
200+
scikit-learn==0.24.2
201+
scipy==1.7.1
202+
SecretStorage==2.3.1
203+
sentry-sdk==1.3.1
204+
service-identity==16.0.0
205+
shellingham==1.4.0
206+
shortuuid==1.0.1
207+
six==1.16.0
208+
sklearn==0.0
209+
smart-open==5.2.1
210+
smmap==4.0.0
211+
soupsieve==2.2.1
212+
spacy==3.1.2
213+
spacy-legacy==3.0.8
214+
srsly==2.4.1
215+
ssh-import-id==5.7
216+
sshpubkeys==3.3.1
217+
strictfire==0.4.1
218+
subprocess32==3.5.4
219+
systemd-python==234
220+
tabulate==0.8.9
221+
tensorboardX==1.8
222+
termcolor==1.1.0
223+
thinc==8.0.10
224+
threadpoolctl==2.2.0
225+
tokenizers==0.10.3
226+
toml==0.10.2
227+
tomli==1.2.1
228+
torch==1.9.0
229+
torchvision==0.10.0
230+
tornado==6.1
231+
tqdm==4.62.2
232+
traitlets==5.1.0
233+
transformers==4.10.0
234+
Twisted==17.9.0
235+
typer==0.3.2
236+
typing-extensions==3.10.0.2
237+
ufw==0.36
238+
unattended-upgrades==0.1
239+
urllib3==1.26.6
240+
virtualenv==15.1.0
241+
WALinuxAgent==2.2.45
242+
wasabi==0.8.2
243+
wcwidth==0.2.5
244+
websocket-client==1.2.1
245+
Werkzeug==2.0.1
246+
xdg==5.1.1
247+
xxhash==2.0.2
248+
yarl==1.7.2
249+
zope.interface==4.3.2

0 commit comments

Comments
 (0)