-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjudication Recipe #106
base: develop
Are you sure you want to change the base?
Adjudication Recipe #106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's obvious that you've figured out a lot more things about how prodigy works, and the refactors to rely more on built-in prodigy functionality looks good. My suggestions pretty much all relate to adding more comments to clarify what's going on; I do wonder if the annotation merging is worth unit testing, but not sure how hard that is.
"view_id": "blocks", | ||
"config": config, | ||
} | ||
|
||
if fetch_media: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth a comment here to explain why this step needs to be done
"ner_manual_highlight_chars": True, | ||
"global_css_dir": CURRENT_DIR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this already in the common config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I wasn't originally copying the full common config.
for version in versions: | ||
session_id = version[SESSION_ID_ATTR] | ||
# Assume: session name does not contain - | ||
session_name = session_id.rsplit("-", maxsplit=1)[1] | ||
if session_id not in session_counts: | ||
session_counts[session_id] = 1 | ||
else: | ||
session_name += f"-{session_counts[session_id]}" | ||
session_counts[session_id] += 1 | ||
sessions.append(session_name) | ||
if "spans" not in version: | ||
# Not sure when an annotated example would be missing a spans field | ||
continue | ||
for span in version["spans"]: | ||
new_span = span.copy() | ||
span_label = span["label"] | ||
new_span["label"] = f"{session_name}: {span_label}" | ||
merged_spans.append(new_span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sort of know what this is doing because you showed us what the interface looks like, but it's hard to follow from just the code. Would help to add some more comments here, and I wonder if this bit is worth a unit test.
What is a version
? How does it relate to sessions and annotation spans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that helps answer how much clarity the type annotations provide. versions
is a list of annotated tasks (List[TaskType]
) to be merged. These tasks are grouped by input hash (this grouping occurs in get_review_stream
. So, version
is an annotated example (TaskType
).
This code is making a single "merged" example for review and does it in the following way:
- Copy the contents from one of the version
- Then for each version:
a. Determine the session name by removing the dataset prefix and adding a numerical suffix if a session has multiple annotated examples (hopefully only a headache for round 1)
b. Copy itsspans
(but with modified label fields) tomerged_spans
list. - Set "merged" example
spans
tomerged_spans
- Add "sessions" field containing list of session names to "merged" example
I've been considering making a separate, stand-alone function for merging annotated examples, that would make it easier to unit test. That said, I think I need to think more about whether it's worth further splitting out the logic for merging "spans" fields |
Co-authored-by: Rebecca Sutton Koeser <[email protected]>
Primary change: adding adjudication recipe for text span annotations.
Secondary change: modified existing annotation recipes to use Prodigy API and option to fetch media.