Adjudication Recipe #106

laurejt · 2024-10-23T15:46:11Z

Primary change: adding adjudication recipe for text span annotations.

Secondary change: modified existing annotation recipes to use Prodigy API and option to fetch media.

rlskoeser

It's obvious that you've figured out a lot more things about how prodigy works, and the refactors to rely more on built-in prodigy functionality looks good. My suggestions pretty much all relate to adding more comments to clarify what's going on; I do wonder if the annotation merging is worth unit testing, but not sure how hard that is.

src/corppa/poetry_detection/annotation/recipe.py

rlskoeser · 2024-10-23T17:51:52Z

src/corppa/poetry_detection/annotation/recipe.py

        "view_id": "blocks",
        "config": config,
    }

+    if fetch_media:


probably worth a comment here to explain why this step needs to be done

rlskoeser · 2024-10-23T17:54:59Z

src/corppa/poetry_detection/annotation/recipe.py

            "ner_manual_highlight_chars": True,
+            "global_css_dir": CURRENT_DIR,


isn't this already in the common config?

Good catch, I wasn't originally copying the full common config.

rlskoeser · 2024-10-23T17:58:35Z

src/corppa/poetry_detection/annotation/recipe.py

+        for version in versions:
+            session_id = version[SESSION_ID_ATTR]
+            # Assume: session name does not contain -
+            session_name = session_id.rsplit("-", maxsplit=1)[1]
+            if session_id not in session_counts:
+                session_counts[session_id] = 1
+            else:
+                session_name += f"-{session_counts[session_id]}"
+                session_counts[session_id] += 1
+            sessions.append(session_name)
+            if "spans" not in version:
+                # Not sure when an annotated example would be missing a spans field
+                continue
+            for span in version["spans"]:
+                new_span = span.copy()
+                span_label = span["label"]
+                new_span["label"] = f"{session_name}: {span_label}"
+                merged_spans.append(new_span)


I sort of know what this is doing because you showed us what the interface looks like, but it's hard to follow from just the code. Would help to add some more comments here, and I wonder if this bit is worth a unit test.
What is a version ? How does it relate to sessions and annotation spans?

Well, that helps answer how much clarity the type annotations provide. versions is a list of annotated tasks (List[TaskType]) to be merged. These tasks are grouped by input hash (this grouping occurs in get_review_stream. So, version is an annotated example (TaskType).

This code is making a single "merged" example for review and does it in the following way:

Copy the contents from one of the version

Then for each version:
a. Determine the session name by removing the dataset prefix and adding a numerical suffix if a session has multiple annotated examples (hopefully only a headache for round 1)
b. Copy its spans (but with modified label fields) to merged_spans list.

Set "merged" example spans to merged_spans

Add "sessions" field containing list of session names to "merged" example

laurejt · 2024-10-23T18:45:10Z

I do wonder if the annotation merging is worth unit testing, but not sure how hard that is.

I've been considering making a separate, stand-alone function for merging annotated examples, that would make it easier to unit test. That said, I think I need to think more about whether it's worth further splitting out the logic for merging "spans" fields

src/corppa/poetry_detection/annotation/recipe.py

Co-authored-by: Rebecca Sutton Koeser <[email protected]>

laurejt added 7 commits October 22, 2024 10:41

Added basic recipe logging

c1b5519

Modify recipes to use Prodigy's get_label utility

1c12cd9

Updated top-level documentation

6d19b2f

Fixed label(s) typos

a961c1e

Update to use Stream component's get_stream

6c5d938

Update recipes to use stream.apply & fetch_media

27a569d

Added review recipe

eafd313

laurejt requested a review from rlskoeser October 23, 2024 15:46

laurejt self-assigned this Oct 23, 2024

rlskoeser approved these changes Oct 23, 2024

View reviewed changes

rlskoeser reviewed Oct 23, 2024

View reviewed changes

src/corppa/poetry_detection/annotation/recipe.py Show resolved Hide resolved

laurejt and others added 2 commits October 23, 2024 16:23

Update src/corppa/poetry_detection/annotation/recipe.py

93db63f

Co-authored-by: Rebecca Sutton Koeser <[email protected]>

Rename remove_images to remove_image_data

a0780c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjudication Recipe #106

Adjudication Recipe #106

laurejt commented Oct 23, 2024

rlskoeser left a comment

rlskoeser Oct 23, 2024

rlskoeser Oct 23, 2024

laurejt Oct 23, 2024

rlskoeser Oct 23, 2024

laurejt Oct 23, 2024

laurejt commented Oct 23, 2024

		"ner_manual_highlight_chars": True,
		"global_css_dir": CURRENT_DIR,

Adjudication Recipe #106

Are you sure you want to change the base?

Adjudication Recipe #106

Conversation

laurejt commented Oct 23, 2024

rlskoeser left a comment

Choose a reason for hiding this comment

rlskoeser Oct 23, 2024

Choose a reason for hiding this comment

rlskoeser Oct 23, 2024

Choose a reason for hiding this comment

laurejt Oct 23, 2024

Choose a reason for hiding this comment

rlskoeser Oct 23, 2024

Choose a reason for hiding this comment

laurejt Oct 23, 2024

Choose a reason for hiding this comment

laurejt commented Oct 23, 2024