More info on the RUBRIC evaluation metric from the Autograder Workbench resource: https://github.com/TREMA-UNH/rubric-grading-workbench
Research papers on the topic: https://www.cs.unh.edu/~dietz/publications/index.html
More info about the TREC RAG track: https://trec-rag.github.io/annoucements/2024-track-guidelines/
RUBRIC Evaluation Result Data for TREC RAG 24 is provided on https://trec-car.cs.unh.edu/rubric-trec-rag24/
Set up the NIX environment, including the RUBRIC code from the Autograder Workbench (Also see section Nix Trouble Shooting
below)
nix develop
or, as a workaround to enable nix' new features:
NIXPKGS_ALLOW_UNFREE=1 nix --extra-experimental-features 'nix-command flakes' develop --impure
Get the TREC RAG data: https://trec-rag.github.io
Place data in following locations
./retrieval/
submitted system runs for retrieval tasktopics.rag24.test.txt
queriesmsmarco_v2.1_doc_segmented.tar
rag corpus (untar and index with duckdb)
- Index TREC RAG corpus with duckdb
python -O -m rubric_trec_rag.build_rag_segment_index --out msmarco.duckdb ./msmarco_v2.1_doc_segmented/msmarco_v2.1_doc_segmented_*.json.gz
- Convert retrieval-run data to input files for RUBRIC
python -O -m rubric_trec_rag.rag_retrieval_inputs --query-path topics.rag24.test.txt --query-out queries-rag24.json -p rubric-rag-retrieval.jsonl.gz --rag-corpus-db msmarco.duckdb retrieval/*
- Convert generation-run data (example: elect-fifth.gz) to input files for RUBRIC
python -O -m rubric_trec_rag.convert_trec_rag_output convert data/trec/elect-fifth.gz -o data/trec/elect-fifth-rubric.gz /*
-
Run question bank generation from RUBRIC
bash rubric-generation-rag24.sh
-
Run LLM-grading on
ungraded
input data (here: example for retrieval task)python -O -m exam_pp.exam_grading data/rubric-rag-retrieval.jsonl.gz -o data/questions-rate--rubric-rag24-retrieval.jsonl.gz --model-name google/flan-t5-large --model-pipeline text2text --prompt-class QuestionSelfRatedUnanswerablePromptWithChoices --q uestion-path data/rag24-questions.jsonl.gz --question-type question-bank
To obtain explanations for FLAN-T5-grades use the prompt class QuestionCompleteConciseUnanswerablePrompt
.
To use keyphrase-style nuggets instead of questions use the NuggetSelfRatedPrompt
.
We highly recommend against obtaining grades with llama3 (out-of-the-box), however, if you insist, here is how: change --model-name
to llama3 huggingface model (e.g., meta-llama/Meta-Llama-3-8B-Instruct
) and --model-pipeline
to llama
- Start nix environment with
nix develop
(If not already done so) - Download rubric grade files from https://trec-car.cs.unh.edu/rubric-trec-rag24/ and place in directory
./data/
- Start the notebook environment with
jupyter notebook
- Open one of the
rubric-rag-results-*.ipynb
notebooks in browser - Run all cells in the notebook
- At the end of the notebook one tar.gz archive is produced with qrels, runs, run-measure-query-value exports, leaderboard in TSV, and plots
Result Data is provided on https://trec-car.cs.unh.edu/rubric-trec-rag24/
If you are getting error message about unfree packages or experimental command, then run one of these longer commands instead
nix --extra-experimental-features 'nix-command flakes' develop
NIXPKGS_ALLOW_UNFREE=1 nix --extra-experimental-features 'nix-command flakes' develop --impure
We recommend the use of Cachix to avoid re-compiling basic dependencies. For that just respond "yes" when asked the following:
do you want to allow configuration setting 'substituters' to be set to 'https://dspy-nix.cachix.org' (y/N)? y
do you want to permanently mark this value as trusted (y/N)? y
If you get error messages indicating that you are not a "trusted user", such as the following
warning: ignoring untrusted substituter 'https://dspy-nix.cachix.org', you are not a trusted user.
Then ask your administrator to edit the nix config file (/etc/nix/nix.conf
) and add your username or group to the trusted user list as follows: trusted-users = root $username @$group
.
RUBRIC@ TREC RAG 24 © 2024 by Laura Dietz is licensed under Creative Commons Attribution-ShareAlike 4.0 International
RUBRIC@ TREC RAG 24 by Laura Dietz is licensed under Creative Commons Attribution-ShareAlike 4.0 International