Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion leaderboard/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Follow the setup instructions in the evaluation harness [README](https://github.

Create two folders `generations_$model` and `metrics_$model` where you will save the generated code and the metrics respectively for your model `$model`.
```bash
model=YOUR_MODEL
cd bigcode-evaluation-harness
mkdir generations_$model
mkdir metrics_$model
Expand Down Expand Up @@ -58,6 +59,7 @@ for lang in "${langs[@]}"; do
--trust_remote_code \
--use_auth_token \
--generation_only \
--save_generations \
--save_generations_path $generations_path
echo "Task $task done"
done
Expand Down Expand Up @@ -111,7 +113,7 @@ for lang in "${langs[@]}"; do
task=multiple-$lang
fi

gen_suffix=generations_$task\_$model.json
gen_suffix=generations_$task\_$model\_$task.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to have the same path format as here

save_generations_path = f"{os.path.splitext(args.save_generations_path)[0]}_{task}.json"
because during evaluation we call --load-generations_path which can be anything. So let's maybe keep the original path to not have task twice?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because during evaluation we call --load-generations_path which can be anything.

Right however, current README steps for Evaluation passes $gen_suffix variable in --load-generations_path argument

gen_suffix=generations_$task\_$model.json
metric_suffix=metrics_$task\_$model.json
echo "Evaluation of $model on $task benchmark, data in $generations_path/$gen_suffix"
sudo docker run -v $(pwd)/$generations_path/$gen_suffix:/app/$gen_suffix:ro -v $(pwd)/$metrics_path:/app/$metrics_path -it evaluation-harness-multiple python3 main.py \
--model $org/$model \
--tasks $task \
--load_generations_path /app/$gen_suffix \

Since $gen_suffix is missing the _task suffix, Running evaluations results in the following error

image

After adding _task suffix in $gen_suffix, evaluations run successfully.

I was able to run evaluations for the Artigenz-Coder-DS-6.7B here after these changes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shouldn't throw an error if you used save_generations_path=generations_$task\_$model.json in the generations

metric_suffix=metrics_$task\_$model.json
echo "Evaluation of $model on $task benchmark, data in $generations_path/$gen_suffix"

Expand Down