The recipes in this repository have since moved to Prodigy and are being maintained there. They will soon even get an upgrade with the advent of spacy-llm support, which features better prompts and multiple LLM providers. That is why we've opted to archive this repo, so that we may focus on maintaining these recipes as part of spaCy and Prodigy directly.
You can learn more by checking out the large language models section on the docs.
This repository contains example code on how to combine zero- and few-shot learning with a small annotation effort to obtain a high-quality dataset with maximum efficiency. Specifically, we use large language models available from OpenAI to provide us with an initial set of predictions, then spin up a Prodigy instance on our local machine to go through these predictions and curate them. This allows us to obtain a gold-standard dataset pretty quickly, and train a smaller, supervised model that fits our exact needs and use-case.
openai_prodigy.mp4
Make sure to install Prodigy as well as a few additional Python dependencies:
python -m pip install prodigy -f https://[email protected]
python -m pip install -r requirements.txt
With XXXX-XXXX-XXXX-XXXX
being your personal Prodigy license key.
Then, create a new API key from openai.com or fetch an existing
one. Record the secret key as well as the organization key
and make sure these are available as environmental variables. For instance, set them in a .env
file in the
root directory:
OPENAI_ORG = "org-..."
OPENAI_KEY = "sk-..."
This recipe marks entity predictions obtained from a large language model and allows you to flag them as correct, or to
manually curate them. This allows you to quickly gather a gold-standard dataset through zero-shot or few-shot learning.
It's very much like using the standard ner.correct
recipe in Prodi.gy,
but we're using GPT-3 as a backend model to make predictions.
python -m prodigy ner.openai.correct dataset filepath labels [--options] -F ./recipes/openai_ner.py
Argument | Type | Description | Default |
---|---|---|---|
dataset |
str | Prodigy dataset to save annotations to. | |
filepath |
Path | Path to .jsonl data to annotate. The data should at least contain a "text" field. |
|
labels |
str | Comma-separated list defining the NER labels the model should predict. | |
--lang , -l |
str | Language of the input data - will be used to obtain a relevant tokenizer. | "en" |
--segment , -S |
bool | Flag to set when examples should be split into sentences. By default, the full input article is shown. | False |
--model , -m |
str | GPT-3 model to use for initial predictions. | "text-davinci-003" |
--prompt_path , -p |
Path | Path to the .jinja2 prompt template. |
./templates/ner_prompt.jinja2 |
--examples-path , -e |
Path | Path to examples to help define the task. The file can be a .yml, .yaml or .json. If set to None , zero-shot learning is applied. |
None |
--max-examples , -n |
int | Max number of examples to include in the prompt to OpenAI. If set to 0, zero-shot learning is always applied, even when examples are available. | 2 |
--batch-size , -b |
int | Batch size of queries to send to the OpenAI API. | 10 |
--verbose , -v |
bool | Flag to print extra information to the terminal. | False |
Let's say we want to recognize dishes, ingredients and cooking equipment from some text we obtained from a cooking subreddit. We'll send the text to GPT-3, hosted by OpenAI, and provide an annotation prompt to explain to the language model the type of predictions we want. Something like:
From the text below, extract the following entities in the following format:
dish: <comma delimited list of strings>
ingredient: <comma delimited list of strings>
equipment: <comma delimited list of strings>
Text:
...
We define the definition of this prompt in a .jinja2 file which also describes how to append examples for few-shot learning.
You can create your own template and provide it to the recipe with the --prompt-path
or -p
option.
Additionally, with --examples-path
or -e
you can set the file path of a .y(a)ml or .json file that contains additional examples:
python -m prodigy ner.openai.correct my_ner_data ./data/reddit_r_cooking_sample.jsonl "dish,ingredient,equipment" -p ./templates/ner_prompt.jinja2 -e ./examples/ner.yaml -n 2 -F ./recipes/openai_ner.py
After receiving the results from the OpenAI API, the Prodigy recipe converts the predictions into an annotation task that can be rendered with Prodigy. The task even shows the original prompt as well as the raw answer we obtained from the language model.
Here, we see that the model is able to correctly recognize dishes, ingredients and cooking equipment right from the start!
The recipe also offers a --verbose
or -v
option that includes the exact prompt and response on the terminal as traffic is received.
Note that because the requests to the API are batched, you might have to scroll back a bit to find the current prompt.
At some point, you might notice a mistake in the predictions of the OpenAI language model. For instance, we noticed an error in the recognition of cooking equipment in this example:
If you see these kind of systematic errors, you can steer the predictions in the right direction by correcting the example and then selecting the small "flag" icon in the top right of the Prodigy UI:
Once you hit accept on the Prodigy interface, the flagged example will be automatically picked up and added to the examples that are sent to the OpenAI API as part of the prompt.
Note
Because Prodigy batches these requests, the prompt will be updated with a slight delay, after the next batch of prompts is sent to OpenAI. You can experiment with making the batch size (--batch-size
or-b
) smaller to have the change come into effect sooner, but this might negatively impact the speed of the annotation workflow.
The ner.openai.correct
recipe fetches examples from OpenAI while annotating, but we've also included a recipe that can fetch a large batch of examples upfront.
python -m prodigy ner.openai.fetch input_data.jsonl predictions.jsonl "dish,ingredient,equipment" -F ./recipes/ner.py
This will create a predictions.jsonl
file that can be loaded with the ner.manual
recipe.
Note that the OpenAI API might return "429 Too Many Request" errors when requesting too much data at once - in this case it's best to ensure you only request 100 or so examples at a time.
After you've curated a set of predictions, you can export the results with db-out
:
python -m prodigy db-out my_ner_data > ner_data.jsonl
The format of the exported annotations contains all the data you need to train a smaller model downstream. Each example in the dataset contains the original text, the tokens, span annotations denoting the entities, etc.
You can also export the data to spaCy's binary format, using data-to-spacy
. This format lets you load in the annotations as spaCy Doc
objects, which can be convenient for further conversion. The data-to-spacy
command also makes it easy to train an NER model with spaCy. First you export the data, specifying the train data as 20% of the total:
python -m prodigy data-to-spacy ./data/annotations/ --ner my_ner_data -es 0.2
Then you can train a model with spaCy or Prodigy:
python -m spacy train ./data/annotations/config.cfg --paths.train ./data/annotations/train.spacy --paths.dev ./data/annotations/dev.spacy -o ner-model
This will save a model to the ner-model/
directory.
We've also included an experimental script to load in the .spacy
binary format and train a model with the HuggingFace transformers
library. You can use the same data you just exported and run the script like this:
# First you need to install the HuggingFace library and requirements
pip install -r requirements_train.txt
python ./scripts/train_hf_ner.py ./data/annotations/train.spacy ./data/annotations/dev.spacy -o hf-ner-model
The resulting model will be saved to the hf-ner-model/
directory.
This recipe enables us to classify texts faster with the help of a large language model. It also provides a "reason" to explain why a particular label was chosen.
python -m prodigy textcat.openai.correct dataset filepath labels [--options] -F ./recipes/openai_textcat.py
Argument | Type | Description | Default |
---|---|---|---|
dataset |
str | Prodigy dataset to save annotations to. | |
filepath |
Path | Path to .jsonl data to annotate. The data should at least contain a "text" field. |
|
labels |
str | Comma-separated list defining the text categorization labels the model should predict. | |
--lang , -l |
str | Language of the input data - will be used to obtain a relevant tokenizer. | "en" |
--segment , -S |
bool | Flag to set when examples should be split into sentences. By default, the full input article is shown. | False |
--model , -m |
str | GPT-3 model to use for initial predictions. | "text-davinci-003" |
--prompt-path , -p |
Path | Path to the .jinja2 prompt template. |
./templates/textcat_prompt.jinja2 |
--examples-path , -e |
Path | Path to examples to help define the task. The file can be a .yml, .yaml or .json. If set to None , zero-shot learning is applied. |
None |
--max-examples , -n |
int | Max number of examples to include in the prompt to OpenAI. If set to 0, zero-shot learning is always applied, even when examples are available. | 2 |
--batch-size , -b |
int | Batch size of queries to send to the OpenAI API. | 10 |
--exclusive-classes , -E |
bool | Flag to make the classification task exclusive. | False |
--verbose , -v |
bool | Flag to print extra information to the terminal. | False |
The textcat
recipes can be used for binary, multiclass, and multilabel text
categorization. You can set this by passing the appropriate number of labels in
the --labels
parameter; for example, passing a single label turns it into
binary classification and so on. We will talk about each one in the proceeding
sections.
Suppose we want to know if a particular Reddit comment talks about a food recipe. We'll send the text to GPT-3 and provide a prompt that instructs the predictions we want.
From the text below, determine wheter or not it contains a recipe. If it is a
recipe, answer "accept." If it is not a recipe, answer "reject."
Your answer should only be in the following format:
answer: <string>
reason: <string>
Text:
For binary classification, we want GPT-3 to return "accept" if a given text is a food recipe and "reject" otherwise. GPT-3's suggestion is then displayed prominently in the UI. We can press the ACCEPT (check mark) button to include the text as a positive example or press the REJECT (cross mark) button if it is a negative example.
python -m prodigy textcat.openai.correct my_binary_textcat_data data/reddit_r_cooking_sample.jsonl --labels recipe -F recipes/openai_textcat.py
Now, suppose we want to classify Reddit comments as a recipe, a feedback, or a question. We can write the following prompt:
Classify the text below to any of the following labels: recipe, feedback, question.
The task is exclusive, so only choose one label from what I provided.
Your answer should only be in the following format:
answer: <string>
reason: <string>
Text:
Then, we can use this recipe to handle multilabel and multiclass cases by
passing the three labels to the --labels
parameter. We should also set the
--exclusive-classes
flag to render a single-choice UI:
python -m prodigy textcat.openai.correct my_multi_textcat_data data/reddit_r_cooking_sample.jsonl \
--labels recipe,feedback,question \
--exclusive-classes \
-F recipes/openai_textcat.py
We write these prompts as a .jinja2 template that can also take in examples for
few-shot learning. You can create your own template and provide it
to the recipe with the --prompt-path
or -p
option. Additionally, with
--examples-path
or -e
you can set the file path of a .y(a)ml or .json file
that contains additional examples. You can also add context in these examples as
we observed it to improve the output:
python -m prodigy textcat.openai.correct my_binary_textcat_data \
./data/reddit_r_cooking_sample.jsonl \
--labels recipe \
--prompt-path ./templates/textcat_prompt.jinja2 \
--examples-path ./examples/textcat_binary.yaml -n 2 \
-F ./recipes/openai_textcat.py
Similar to the NER recipe, this recipe also converts the predictions into an
annotation task that can be rendered with Prodigy. For binary classification, we
use the classification
interface with custom HTML elements, while for multilabel or multiclass text
categorization, we use the
choice
annotation interface.
Notice that we include the original prompt and the OpenAI response in the UI.
Lastly, you can use the --verbose
or -v
flag to show the exact prompt and
response on the terminal. Note that because the requests to the API are batched,
you might have to scroll back a bit to find the current prompt.
Similar to the NER recipes, you can also steer the predictions in the right direction by correcting the example and then selecting the small "flag" icon in the top right of the Prodigy UI:
Once you hit the accept button on the Prodigy interface, the flagged example will be picked up and added to the few-shot examples sent to the OpenAI API as part of the prompt.
Note
Because Prodigy batches these requests, the prompt will be updated with a slight delay, after the next batch of prompts is sent to OpenAI. You can experiment with making the batch size (--batch-size
or-b
) smaller to have the change come into effect sooner, but this might negatively impact the speed of the annotation workflow.
The textcat.openai.fetch
recipe allows us to fetch a large batch of examples
upfront. This is helpful when you are with a highly-imbalanced data and interested
only in rare examples.
python -m prodigy textcat.openai.fetch input_data.jsonl predictions.jsonl --labels Recipe -F ./recipes/openai_textcat.py
This will create a predictions.jsonl
file that can be loaded with the
textcat.manual
recipe.
Note that the OpenAI API might return "429 Too Many Request" errors when requesting too much data at once - in this case it's best to ensure you only request 100 or so examples at a time and take a look at the API's rate limits.
The textcat.openai.fetch
recipe is suitable for working with datasets where
there is severe class imbalance. Usually, you'd want to find examples of the
rare class rather than annotating a random sample. From there, you want to
upsample them to train a decent model and so on.
This is where large language models like OpenAI might help.
Using the Reddit r/cooking dataset, we prompted OpenAI to
look for comments that resemble a food recipe. Instead of annotating 10,000
examples, we ran textcat.openai.fetch
and obtained 145 positive classes. Out
of those 145 examples, 114 turned out to be true positives (79% precision). We
then checked 1,000 negative examples and found 12 false negative cases (98%
recall).
Ideally, once we fully annotated the dataset, we can train a supervised model that is better to use than relying on zero-shot predictions for production. The running cost is low and it's easier to manage.
After you've curated a set of predictions, you can export the results with
db-out
:
python -m prodigy db-out my_textcat_data > textcat_data.jsonl
The format of the exported annotations contains all the data you need to train a smaller model downstream. Each example in the dataset contains the original text, the tokens, span annotations denoting the entities, etc.
You can also export the data to spaCy's binary
format, using
data-to-spacy
. This format lets
you load in the annotations as spaCy Doc
objects, which can be convenient for
further conversion. The data-to-spacy
command also makes it easy to train a
text categorization model with spaCy. First you export the data, specifying the
train data as 20% of the total:
# For binary textcat
python -m prodigy data-to-spacy ./data/annotations/ --textcat my_textcat_data -es 0.2
# For multilabel textcat
python -m prodigy data-to-spacy ./data/annotations/ --textcat-multilabel my_textcat_data -es 0.2
Then you can train a model with spaCy or Prodigy:
python -m spacy train ./data/annotations/config.cfg --paths.train ./data/annotations/train.spacy --paths.dev ./data/annotations/dev.spacy -o textcat-model
This will save a model to the textcat-model/
directory.
This recipe generates terms and phrases obtained from a large language model. These terms can be curated and turned into patterns files, which can help with downstream annotation tasks.
python -m prodigy terms.openai.fetch query filepath [--options] -F ./recipes/openai_terms.py
Argument | Type | Description | Default |
---|---|---|---|
query |
str | Query to send to OpenAI | |
output_path |
Path | Path to save the output | |
--seeds ,-s |
str | One or more comma-separated seed phrases. | "" |
--n ,-n |
int | Minimum number of items to generate | 100 |
--model , -m |
str | GPT-3 model to use for completion | "text-davinci-003" |
--prompt-path , -p |
Path | Path to jinja2 prompt template | templates/terms_prompt.jinja2 |
--verbose ,-v |
bool | Print extra information to terminal | False |
--resume , -r |
bool | Resume by loading in text examples from output file | False |
--progress ,-pb |
bool | Print progress of the recipe. | False |
--temperature ,-t |
float | OpenAI temperature param | 1.0 |
--top-p , --tp |
float | OpenAI top_p param | 1.0 |
--best-of , -bo |
int | OpenAI best_of param" | 10 |
--n-batch ,-nb |
int | OpenAI batch size param | 10 |
--max-tokens , -mt |
int | Max tokens to generate per call | 100 |
Suppose you're interested in detecting skateboard tricks in text, then you might want to start with a term list of known tricks. You might want to start with the following query:
# Base behavior, fetch at least 100 terms/phrases
python -m prodigy terms.openai.fetch "skateboard tricks" tricks.jsonl --n 100 --prompt-path templates/terms_prompt.jinja2 -F recipes/openai_terms.py
This will generate a prompt to OpenAI that asks to try and generate at least 100 examples of "skateboard tricks". There's an upper limit to the amount of tokens that can be generated by OpenAI, but this recipe will try and keep collecting terms until it reached the amount specified.
You can choose to make the query more elaborate if you want to try to be more precise, but you can alternatively
also choose to add some seed terms via --seeds
. These will act as starting examples to help steer OpenAI
in the right direction.
# Base behavior but with seeds
python -m prodigy terms.openai.fetch "skateboard tricks" tricks.jsonl --n 100 --seeds "kickflip,ollie" --prompt-path templates/terms_prompt.jinja2 -F recipes/openai_terms.py
Collecting many examples can take a while, so it can be helpful to show the progress, via --progress
as requests are sent.
# Adding progress output as we wait for 500 examples
python -m prodigy terms.openai.fetch "skateboard tricks" tricks.jsonl --n 500 --progress --seeds "kickflip,ollie" --prompt-path templates/terms_prompt.jinja2 -F recipes/openai_terms.py
After collecting a few examples, you might want to generate more. You can choose to continue from a previous output file. This will effectively re-use those examples as seeds for the prompt to OpenAI.
# Use the `--resume` flag to re-use previous examples
python -m prodigy terms.openai.fetch "skateboard tricks" tricks.jsonl --n 50 --resume --prompt-path templates/terms_prompt.jinja2 -F recipes/openai_terms.py
When the recipe is done, you'll have a tricks.jsonl
file that has contents that look like this:
{"text":"pop shove it","meta":{"openai_query":"skateboard tricks"}}
{"text":"switch flip","meta":{"openai_query":"skateboard tricks"}}
{"text":"nose slides","meta":{"openai_query":"skateboard tricks"}}
{"text":"lazerflip","meta":{"openai_query":"skateboard tricks"}}
{"text":"lipslide","meta":{"openai_query":"skateboard tricks"}}
...
You now have a tricks.jsonl
file on disk that contains skateboard tricks, but you cannot
assume that all of these will be accurate. The next step would be to review the terms and you
can use the textcat.manual
recipe that comes
with Prodigy for that.
# The tricks.jsonl was fetched from OpenAI beforehand
python -m prodigy textcat.manual skateboard-tricks-list tricks.jsonl --label skateboard-tricks
This generates an interface that looks like this:
You can manually accept or reject each example and when you're done annotating you can export
the annotated text into a patterns file via the terms.to-patterns
recipe.
# Generate a `patterns.jsonl` file.
python -m prodigy terms.to-patterns skateboard-tricks-list patterns.jsonl --label skateboard-tricks --spacy-model blank:en
When the recipe is done, you'll have a patterns.jsonl
file that has contents that look like this:
{"label":"skateboard-tricks","pattern":[{"lower":"pop"},{"lower":"shove"},{"lower":"it"}]}
{"label":"skateboard-tricks","pattern":[{"lower":"switch"},{"lower":"flip"}]}
{"label":"skateboard-tricks","pattern":[{"lower":"nose"},{"lower":"slides"}]}
{"label":"skateboard-tricks","pattern":[{"lower":"lazerflip"}]}
{"label":"skateboard-tricks","pattern":[{"lower":"lipslide"}]}
...
OpenAI has a hard limit on the prompt size. You cannot have a prompt larger than 4079 tokens. Unfortunately that means that there is a limit to the size of term lists that you can generate. The recipe will report an error when this happens, but it's good to be aware of this limitation.
The goal of this recipe is to quickly allow someone to compare the quality of outputs from two prompts in a quantifiable and blind way.
python -m prodigy ab.openai.prompts dataset inputs_path display_template_path prompt1_template_path prompt2_template_path [--options] -F ./recipes/openai_ab.py
Argument | Type | Description | Default |
---|---|---|---|
dataset |
str | Prodigy dataset to save answers into | |
inputs_path |
Path | Path to jsonl inputs | |
display_template_path |
Path | Template for summarizing the arguments | |
prompt1_template_path |
Path | Path to the first jinja2 prompt template | |
prompt2_template_path |
Path | Path to the second jinja2 prompt template | |
--model , -m |
str | GPT-3 model to use for completion | "text-davinci-003" |
--batch-size , -b |
int | Batch size to send to OpenAI API | 10 |
--verbose ,-v |
bool | Print extra information to terminal | False |
--no-random ,-NR |
bool | Don't randomize which annotation is shown as correct | False |
--repeat , -r |
int | How often to send the same prompt to OpenAI | 1 |
As an example, let's try to generate humorous haikus. To do that we first need to construct two jinja files that represent the prompt to send to OpenAI.
Write a haiku about {{topic}}.
Write an incredibly hilarious haiku about {{topic}}. So funny!
You can provide variables for these prompts by constructing a .jsonl file with the required
parameters. In this case we need to make sure that {{topic}}
is accounted for.
Here's an example .jsonl
file that could work.
{"id": 0, "prompt_args": {"topic": "star wars"}}
{"id": 0, "prompt_args": {"topic": "kittens"}}
{"id": 0, "prompt_args": {"topic": "the python programming language"}}
{"id": 0, "prompt_args": {"topic": "maths"}}
Note
All the arguments under
prompt_args
will be passed to render the jinja templates. Theid
is mandatory and can be used to identify groups in later analysis.
We're nearly ready to evaluate, but this recipe requires one final jinja2 template. This one won't be used to generate a prompt, but it will generate a useful title that reminds the annotator of the current task. Here's an example of such a template.
A haiku about {{topic}}.
When you put all of these templates together you can start annotating. The command below
starts the annotation interface and also uses the --repeat 4
option. This will ensure
that each topic will be used to generate a prompt at least 4 times.
python -m prodigy ab.openai.prompts haiku data/ab_example.jsonl templates/ab/input.jinja2 templates/ab/prompt1.jinja2 templates/ab/prompt2.jinja2 --repeat 5 -F recipes/openai_ab.py
This is what the annotation interface looks like:
When you look at this interface you'll notice that the title template is rendered and that
you're able to pick from two options. Both options are responses from OpenAI that were
generated by the two prompt templates. You can also see the prompt_args
rendered in
the lower right corner of the choice menu.
From here you can annotate your favorite examples and gather data that might help you decide on which prompt is best.
Once you're done annotating you'll be presented with an overview of the results.
=========================== ✨ Evaluation results ===========================
✔ You preferred prompt1.jinja2
prompt1.jinja2 11
prompt2.jinja2 5
But you can also fetch the raw annotations from the database for further analysis.
python -m prodigy db-out haiku
There’s lots of interesting follow-up experiments to this, and lots of ways to adapt the basic idea to different tasks or data sets. We’re also interested to try out different prompts. It’s unclear how much the format the annotations are requested in might change the model’s predictions, or whether there’s a shorter prompt that might perform just as well. We also want to run some end-to-end experiments.