Skip to content

Commit a4b500a

Browse files
authored
Pass on readme + citation (#716)
* remove WIP * re-structure the README + re-center it on usage rather prompts creation * remove link to api documentation * pass on CONTRIBUITING * api doc * grammarly * `get_fixed_answer_choices_list` is redundant * fix unused import * adress comments * citation
1 parent 5ed274d commit a4b500a

File tree

7 files changed

+286
-408
lines changed

7 files changed

+286
-408
lines changed

API_DOCUMENTATION.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Manipulating prompts
2+
PromptSource implements 4 classes to store, manipulate and use prompts and their metadata: `Template`, `Metadata`, `DatasetTemplates` and `TemplateCollection`. All of them are implemented in [`templates.py`](promptsource/templates.py)
3+
4+
## Class `Template` and `Metadata`
5+
`Template` is a class that wraps a prompt, its associated metadata, and implements the helper functions to use the prompt.
6+
7+
Instances of `Template` have the following main methods that will come handy:
8+
* `apply(example, truncate=True, highlight_variables=False)`: Create a prompted example by applying the template to the given example
9+
- `example` (Dict): the dataset example to create a prompt for
10+
- `truncate` (Bool, default to `True`): if True, example fields will be truncated to `TEXT_VAR_LENGTH` chars
11+
- `highlight_variables`(Bool, default to `False`): highlight the added variables (internal use for the app rendering)
12+
* `get_id()`: Get the uuid of the prompt
13+
* `get_name()`: Get the name of the prompt
14+
* `get_reference()`: Get any additional information about the prompt (such as bibliographic reference)
15+
* `get_answer_choices_list(example)`: If applicable, returns a list of answer choices for a given example.
16+
17+
Each `Template` also has a `metadata` attribute, an instance of the class `Metadata` that encapsulates the following 3 attributes:
18+
* `original_task`: If True, this prompt asks a model to perform the original task designed for this dataset.
19+
* `choices_in_prompt`: If True, the answer choices are included in the templates such that models see those choices in the input. Only applicable to classification tasks.
20+
* `metrics`: List of strings denoting metrics to use for evaluation
21+
22+
## Class `DatasetTemplates`
23+
`DatasetTemplates` is a class that wraps all the prompts (each of them are instances of `Template`) for a specific dataset/subset and implements all the helper functions necessary to read/write to the YAML file in which the prompts are saved.
24+
25+
You will likely mainly be interested in getting the existing prompts and their names for a given dataset. You can do that with the following instantiation:
26+
```python
27+
>>> template_key = f"{dataset_name}/{subset_name}" if subset_name is not None else dataset_name
28+
>>> prompts = DatasetTemplates(template_key)
29+
>>> len(prompts) # Returns the number of prompts for the given dataset
30+
>>> prompts.all_template_names # Returns a sorted list of all templates names for this dataset
31+
```
32+
33+
## Class `TemplateCollection`
34+
`TemplateCollection` is a class that encapsulates all the prompts available under PromptSource by wrapping the `DatasetTemplates` class. It initializes the `DatasetTemplates` for all existing template folders, gives access to each `DatasetTemplates`, and provides aggregated counts overall `DatasetTemplates`.
35+
36+
The main methods are:
37+
* `get_dataset(dataset_name, subset_name)`: Return the DatasetTemplates object corresponding to the dataset name
38+
- `dataset_name` (Str): name of the dataset to get
39+
- `subset_name` (Str, default to None): name of the subset
40+
* `get_templates_count()`: Return the overall number count over all datasets. NB: we don't breakdown datasets into subsets for the count, i.e subsets count are included into the dataset count

CITATION.cff

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
cff-version: "0.2.0"
2+
date-released: 2022-02
3+
message: "If you use this software, please cite it using these metadata."
4+
title: "PromptSource"
5+
url: "https://github.com/bigscience-workshop/promptsource"
6+
authors:
7+
- family-names: Bach
8+
given-names: "Stephen H."
9+
- family-names: Sanh
10+
given-names: Victor
11+
- family-names: Yong
12+
given-names: Zheng-Xin
13+
- family-names: Webson
14+
given-names: Albert
15+
- family-names: Raffel
16+
given-names: Colin
17+
- family-names: Nayak
18+
given-names: "Nihal V."
19+
- family-names: Sharma
20+
given-names: Abheesht
21+
- family-names: Kim
22+
given-names: Taewoon
23+
- family-names: Bari
24+
given-names: "M Saiful"
25+
- family-names: Fevry
26+
given-names: Thibault
27+
- family-names: Alyafeaiu
28+
given-names: Zaid
29+
- family-names: Dey
30+
given-names: Manan
31+
- family-names: Santilli
32+
given-names: Andrea
33+
- family-names: Sun
34+
given-names: Zhiqing
35+
- family-names: Ben-David
36+
given-names: Srulik
37+
- family-names: Xu
38+
given-names: Canwen
39+
- family-names: Chhablani
40+
given-names: Gunjan
41+
- family-names: Wang
42+
given-names: Han
43+
- family-names: Fries
44+
given-names: "Jason Alan"
45+
- family-names: Al-shaibani
46+
given-names: "Maged S."
47+
- family-names: Sharma
48+
given-names: Shanya
49+
- family-names: Thakker
50+
given-names: Urmish
51+
- family-names: Almubarak
52+
given-names: Khalid
53+
- family-names: Tang
54+
given-names: Xiangru
55+
- family-names: Tian-Jian
56+
given-names: Mike
57+
- family-names: Rush
58+
given-names: "Alexander M."
59+
preferred-citation:
60+
type: article
61+
authors:
62+
- family-names: Bach
63+
given-names: "Stephen H."
64+
- family-names: Sanh
65+
given-names: Victor
66+
- family-names: Yong
67+
given-names: Zheng-Xin
68+
- family-names: Webson
69+
given-names: Albert
70+
- family-names: Raffel
71+
given-names: Colin
72+
- family-names: Nayak
73+
given-names: "Nihal V."
74+
- family-names: Sharma
75+
given-names: Abheesht
76+
- family-names: Kim
77+
given-names: Taewoon
78+
- family-names: Bari
79+
given-names: "M Saiful"
80+
- family-names: Fevry
81+
given-names: Thibault
82+
- family-names: Alyafeaiu
83+
given-names: Zaid
84+
- family-names: Dey
85+
given-names: Manan
86+
- family-names: Santilli
87+
given-names: Andrea
88+
- family-names: Sun
89+
given-names: Zhiqing
90+
- family-names: Ben-David
91+
given-names: Srulik
92+
- family-names: Xu
93+
given-names: Canwen
94+
- family-names: Chhablani
95+
given-names: Gunjan
96+
- family-names: Wang
97+
given-names: Han
98+
- family-names: Fries
99+
given-names: "Jason Alan"
100+
- family-names: Al-shaibani
101+
given-names: "Maged S."
102+
- family-names: Sharma
103+
given-names: Shanya
104+
- family-names: Thakker
105+
given-names: Urmish
106+
- family-names: Almubarak
107+
given-names: Khalid
108+
- family-names: Tang
109+
given-names: Xiangru
110+
- family-names: Tian-Jian
111+
given-names: Mike
112+
- family-names: Rush
113+
given-names: "Alexander M."
114+
title: "PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts"
115+
year: 2022
116+
publisher: "arXiv"
117+
url: "https://arxiv.org/abs/2202.01279"
118+
address: "Online"

CONTRIBUTING.md

Lines changed: 32 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Contributing
22

3-
One of the best ways to contribute is by writing prompts!
3+
The best way to contribute growing P3 is by writing prompts for new datasets!
44

55
### What are Prompts?
66

7-
A prompt consists of a template(input template and target template, along with collection of associated metadata. A template is a piece of code written in a templating language called
7+
A prompt consists of a template: input template and target template, along with collection of associated metadata. A template is a piece of code written in a templating language called
88
[Jinja](https://jinja.palletsprojects.com/en/3.0.x/). A template defines
99
a function that maps an example from a dataset in the
1010
[Hugging Face datasets library](https://huggingface.co/datasets) to two strings of
@@ -17,7 +17,7 @@ prompt.
1717

1818
1. **Set up the app.** Fork the app and set up using the
1919
[README](https://github.com/bigscience-workshop/promptsource/blob/main/README.md).
20-
1. **Examine the dataset.** Select or type the dataset into the dropdown in the app.
20+
1. **Examine the dataset.** In the "Sourcing" mode, select or type the dataset into the dropdown.
2121
If the dataset has subsets (subsets are not the same as splits), you can select
2222
which one to work on. Note that prompts are subset-specific. You can find
2323
out background information on the dataset by reading the information in the
@@ -29,15 +29,17 @@ You can always update the name later. If you want to cancel the prompt, select
2929
1. **Write the prompt**. In the box labeled "Template," enter a Jinja expression.
3030
See the [getting started guide](#getting-started-using-jinja-to-write-prompts)
3131
and [cookbook](#jinja-cookbook) for details on how to write templates.
32+
1. **Fill in metadata**. Fill in the metadata for the current prompt: reference, original task, choices in templates, and answer choices.
33+
See [Metadata](#metadata) for more details about these fields.
3234
1. **Save the prompt**. Hit the "Save" button. The output of the prompt
3335
applied to the current example will appear in the right sidebar.
3436
1. **Verify the prompt**. Check that you didn't miss any case by scrolling
3537
through a handful of examples of the prompted dataset using the
3638
"Prompted dataset viewer" mode.
37-
1. **Write between 5 and 10 prompts**. Repeat the steps 4 to 8 to create between 5
39+
1. **Write between 5 and 10 prompts**. Repeat the steps 4 to 9 to create between 5
3840
and 10 (more if you want!) prompts per dataset/subset. Feel free to introduce
3941
a mix of formats, some that follow the templates listed in the [best practices](#best-practices)
40-
and some that are more diverse in the format and the formulation.
42+
and some that are more diverse in the format and the formulation.
4143
1. **Duplicate the prompts(s).** If the dataset you have chosen bear the same
4244
format as other datasets (for instance, `MNLI` and `SNLI` have identical formats),
4345
you can simply duplicate the prompts you have written to these additional datasets.
@@ -108,8 +110,9 @@ it has the answer. Can you tell me the answer?
108110
{{answers["text"][0]}}'
109111
```
110112

111-
## Options
112-
In addition to the template itself, you can fill out several other fields in the app.
113+
## Metadata
114+
In addition to the template itself, you need to fill out several other fields.
115+
These metadata facilitate finding and using the prompts.
113116
* **Prompt Reference.** If your template was inspired by a paper, note the
114117
reference in the "Prompt Reference" section. You can also add a description of
115118
what your template does.
@@ -166,8 +169,7 @@ introduce some diversity by prompting a given dataset into multiple tasks and pr
166169
description in the "Template Reference" text box. An example is given
167170
in the already prompted `movie_rationales`.
168171
* **Filtering prompts.** If a prompt is applied to an example and produces an
169-
empty string, that prompt/example pair will be skipped. (Either the entire target
170-
is whitespace or the text on either side of the separator `|||` is whitespace.
172+
empty string, that prompt/example pair will be skipped.
171173
You can therefore create prompts that only apply to a subset of the examples by
172174
wrapping them in Jinja if statements. For example, in the `TREC` dataset, there
173175
are fine-grained categories that are only applicable to certain coarse-grained categories.
@@ -180,6 +182,17 @@ Is this question asking for a {{"definition"}}, a {{"description"}}, a {{"manner
180182
{{ {0: "Manner", 7: "Defintion", 9: "Reason", 12: "Description"}[label_fine] }}
181183
{% endif %}
182184
```
185+
For datasets that have splits with no labels (for instance test split without ground truth labels), you can wrap the conditional statement on the target side.
186+
For instance for `super_glue/boolq`, the following prompt would return an empty target on the test split, but not an empty prompted example:
187+
```jinja2
188+
{{ passage }}
189+
Question: {{ question }}
190+
Answer:
191+
|||
192+
{% if label != -1 %}
193+
{{ answer_choices[label] }}
194+
{% endif %}
195+
```
183196
* **Conditional generation format.** Always specify the target and separate it from the prompt
184197
by indicating the vertical bars `|||`. The target will be generated by a generative model
185198
conditioned on the input you wrote. You can always transform an "infix" prompt format
@@ -226,15 +239,15 @@ First, {{ ctx_a.lower() }} Then, {{ ctx_b.lower() }}...
226239
227240
Complete the above description with a chosen ending:
228241
229-
Ending 1: {{ endings[0] }}
242+
(a) {{ answer_choices[0] }}
230243
231-
Ending 2: {{ endings[1] }}
244+
(b) {{ answer_choices[1] }}
232245
233-
Ending 3: {{ endings[2] }}
246+
(c) {{ answer_choices[2] }}
234247
235-
Ending 4: {{ endings[3] }}
248+
(d) {{ answer_choices[3] }}
236249
237-
||| {{ {"0": "Ending 1", "1": "Ending 2", "2": "Ending 3", "3": "Ending 4"}[label] }}
250+
||| {{ answer_choices[label | int()] }}
238251
```
239252
Notice how it uses functions to consistently capitalize the information and provides lots
240253
of context (referring explicitly to "description" and "chosen ending.")
@@ -251,26 +264,17 @@ Which one is the most appropriate answer/completion for the paragraph that follo
251264
{%- endfor %}
252265
```
253266
Like above, it uses functions to present the choices in a readable way. Also, it
254-
uses a for loop with conditions to handle the more intricate dataset schema.
267+
uses a for loop with conditions to handle the more intricate dataset schema.
255268

256269
Here's one for `paws`:
257270
```jinja2
258-
{% if label == 0 or label == 1 %}
259271
Sentence 1: {{sentence1}}
260272
Sentence 2: {{sentence2}}
261273
Question: Does Sentence 1 paraphrase Sentence 2? Yes or No?
262-
{% endif %}
263-
|||
264-
{% if label == 0 %}
265-
No
266-
{% elif label == 1 %}
267-
Yes
268-
{% endif %}
269-
274+
|||
275+
{{answer_choices[label]}}
270276
```
271-
This template has to do a few things, even though it's a yes no question. First,
272-
the label might be unknown, so the pieces are wrapped in if statements.
273-
Second, notice that the choices `Yes or No` are not escaped. Yes/no, true/false
277+
Notice that the choices `Yes or No` are not escaped. Yes/no, true/false
274278
are choices that do not need to be escaped (unlike categories).
275279

276280
## Uploading Prompts
@@ -307,7 +311,7 @@ do_something_else
307311
```jinja
308312
{% for a, b in zip(list_A, list_B) %}
309313
do_something_with_a_and_b
310-
{% endfor %}
314+
{% endfor %}
311315
```
312316

313317

0 commit comments

Comments
 (0)