8 replicate demetr results for bleu #10

klh5 · 2024-12-06T15:30:52Z

Closes #8

Runs BLEU, SacreBLEU, BLASER 2.0, and COMET over the DEMETR dataset.

jack89roberts

Added a few comments, mostly around some ideas for refactoring that might make things a bit easier to extend to different datasets/metrics, though I realised later that it's not as straightforward to have a common metrics interface as I thought so I'm open to different ideas/not spending time going down that route.

The actions are failing with linting and mypy errors, setting up pre-commit or running ruff locally should fix the linting ones (mostly it just doesn't like a blank newline at the start of functions), and adding ignore_missing_imports = true to the mypy section in pyproject.toml should get mypy to quieten down (e.g. see here).

jack89roberts · 2024-12-12T09:38:50Z

src/m4st/metrics.py

+
+    def __init__(self) -> None:
+        self.blaser_ref = load_blaser_model("blaser_2_0_ref").eval()
+        self.blaser_qe = load_blaser_model("blaser_2_0_qe").eval()


For BLASER and COMET would it make more sense for the ref and qe versions to be different classes?

We could then also have a common Metrics parent class that's always expected to have a function get_score(self, reference, prediction, source)/similar, with some being optional/ignored depending on the metric, which might make some things a bit easier to extend later (though it doesn't look like we'll be using many more metrics).

jack89roberts · 2024-12-12T09:42:24Z

src/m4st/process_demetr.py

+        if "BLASER_ref" in metrics_to_use or "BLASER_qe" in metrics_to_use:
+            self.blaser = BLASERScore()
+        if "COMET_ref" in metrics_to_use or "COMET_qe" in metrics_to_use:
+            self.comet = COMETScore()


Also re having a common Metrics class - ProcessDEMETR could then just have a list of metrics classes, and the loop in process_demetr_category could be more like:

for metric in self.metrics: metric.get_score(ref_txt, mt_txt, src_txt)

without the need for the big if block.

I hadn't considered BLASER also needing a lang code though, so get_score would need to support kwargs somehow too and it's messier than I thought (incidentally, it's interesting that BLASER requires this but COMET doesn't).

jack89roberts · 2024-12-12T09:47:53Z

src/m4st/utils.py

+import os
+
+
+def load_json(json_path: os.PathLike | str) -> list:


Return type could also be dict

jack89roberts · 2024-12-12T09:49:25Z

src/m4st/process_demetr.py

+
+            if ds_cat in cats_to_process or not cats_to_process:
+                print(f"Processing input file {ds}")
+                reverse_acc = ds_cat == 35


A comment would be helpful here to explain why category 35 needs to be reversed.

jack89roberts · 2024-12-12T09:52:25Z

pyproject.toml

@@ -10,7 +10,7 @@ authors = [
 ]
 description = "Evaluation of Metrics for Speech Translation (M4ST)"
 readme = "README.md"
-requires-python = ">=3.10"
+requires-python = "==3.11.*"


Is one of the dependencies forcing this or can it be more flexible?

jack89roberts · 2024-12-12T10:03:19Z

src/m4st/metrics.py

+
+        src_embs = self.text_embedder.predict([source], source_lang=source_lang_code)
+        ref_embs = self.text_embedder.predict([reference], source_lang="eng_Latn")
+        mt_embs = self.text_embedder.predict([prediction], source_lang="eng_Latn")


Assuming the prediction & reference are English is probably fine for what we're doing, but I'd be slightly paranoid about forgetting to change the language here if we do try anything in a different language. So we could consider making reference and mt language arguments/class attributes that have to be set when creating an instance or calling the score, so it's more explicit that we're assuming English when we use it? Or otherwise just a comment/note somewhere that makes it clear we're assuming English.

klh5 · 2024-12-13T17:21:23Z

On hold while I also implement #13

klh5 added 14 commits November 27, 2024 16:11

➕ Adding nltk dependency

71dc6bd

⚡ Update reqs to add sonar

34a151a

🚧 Adding files for metrics test with DEMETR

2afa41e

🔧 Add language code for BLASER

0344920

🥚 Move DEMETR functions to class

436f0d0

🔨 Fixed lang code for Mandarin Chinese

b610ae5

💡 Outputs written to csv

9e3e071

📌 Add comet library

ce94dc8

➕ Add BLASER QA metric

8d8e2a6

♻️ Move accuracy calc to function

e6a914e

✨ Add COMET metric

374a26a

✨ Add processing for COMET

b3540b7

♻️ Move print statement

a6fc1b6

🔧 Fix typo

ba639b9

klh5 requested a review from jack89roberts December 6, 2024 15:30

klh5 linked an issue Dec 6, 2024 that may be closed by this pull request

Replicate DEMETR results for BLEU #8

Open

3 tasks

klh5 added 3 commits December 11, 2024 16:26

⬇️ Downgrade Comet to fix MPS issue

4ef9b99

🔨 Fix comet output

76d8d88

🔨 Make metrics optional

a4523ba

jack89roberts reviewed Dec 12, 2024

View reviewed changes

klh5 added 3 commits January 7, 2025 15:26

⚡ Batch BLASER scoring by source language

f311f09

🔥 Remove NLTK BLEU

27bdc6b

✨ Add sacrebleu case

77c7118

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8 replicate demetr results for bleu #10

8 replicate demetr results for bleu #10

klh5 commented Dec 6, 2024

jack89roberts left a comment

jack89roberts Dec 12, 2024

jack89roberts Dec 12, 2024

jack89roberts Dec 12, 2024

jack89roberts Dec 12, 2024

jack89roberts Dec 12, 2024

jack89roberts Dec 12, 2024

klh5 commented Dec 13, 2024

		import os


		def load_json(json_path: os.PathLike \| str) -> list:

8 replicate demetr results for bleu #10

Are you sure you want to change the base?

8 replicate demetr results for bleu #10

Conversation

klh5 commented Dec 6, 2024

jack89roberts left a comment

Choose a reason for hiding this comment

jack89roberts Dec 12, 2024

Choose a reason for hiding this comment

jack89roberts Dec 12, 2024

Choose a reason for hiding this comment

jack89roberts Dec 12, 2024

Choose a reason for hiding this comment

jack89roberts Dec 12, 2024

Choose a reason for hiding this comment

jack89roberts Dec 12, 2024

Choose a reason for hiding this comment

jack89roberts Dec 12, 2024

Choose a reason for hiding this comment

klh5 commented Dec 13, 2024