From 66fc5f18e2dc6b8d6a9a1f6d5944f5e17307ba43 Mon Sep 17 00:00:00 2001
From: Ruangrin L <88072261+idalr@users.noreply.github.com>
Date: Sun, 28 Jan 2024 13:48:02 +0100
Subject: [PATCH] added commands and collapsed all histograms
---
.../Relation_argument_outer_token_distance.md | 72 +++++++++++++++----
1 file changed, 60 insertions(+), 12 deletions(-)
diff --git a/AM_statistics/Relation_argument_outer_token_distance.md b/AM_statistics/Relation_argument_outer_token_distance.md
index 82e246bd..9fae1b04 100644
--- a/AM_statistics/Relation_argument_outer_token_distance.md
+++ b/AM_statistics/Relation_argument_outer_token_distance.md
@@ -9,6 +9,54 @@ The distance is measured from the first token of the first argumentative unit to
We collect the following statistics: number of documents in the split (*no. doc*), no. of argumentative units (*len*), mean of token distance (*mean*), standard deviation of the distance (*std*), minimum outer distance (*min*), and maximum outer distance (*max*).
We also present histograms in the collasible, showing the distribution of these relation distances (x-axis; and unit-counts in y-axis), accordingly.
+### Usage
+
+To manually collect a statistics for each dataset, execute the command as follows.
+
+
+Command lines
+
+AAE2
+
+```
+python src/evaluate_documents.py dataset=aae2_prepared metric=count_relation_argument_distances
+```
+
+AbsTRCT
+
+```
+python src/evaluate_documents.py dataset=abstrct_prepared metric=count_relation_argument_distances
+```
+
+ArgMicro
+
+```
+python src/evaluate_documents.py dataset=argmicro_prepared metric=count_relation_argument_distances
+```
+
+CDCP
+
+```
+python src/evaluate_documents.py dataset=cdcp_prepared metric=count_relation_argument_distances
+```
+
+SciArg
+
+```
+python src/evaluate_documents.py dataset=sciarg_prepared metric=count_relation_argument_distances ++metric.tokenize_kwargs.strict_span_conversion=false
+```
+
+SciDTB_Argmin
+
+```
+python src/evaluate_documents.py dataset=scidtb_argmin_prepared metric=count_relation_argument_distances
+```
+
+**Remark**:
+The script `evaluate_documents.py` is from [PyTorch-IE-Hydra-Template](https://github.com/ArneBinder/pytorch-ie-hydra-template-1).
+
+
+
## AAE2
| statistics | train | test |
@@ -20,13 +68,13 @@ We also present histograms in the collasible, showing the distribution of these
| min | 9 | 10 |
| max | 514 | 442 |
-
+
Histogram (split: train, 322 documents)
![rtd_aae2_train.png](img%2Frelation_token_distance%2Frtd_aae2_train.png)
-
+
Histogram (split: test, 80 documents)
![rtd_aae2_test.png](img%2Frelation_token_distance%2Frtd_aae2_test.png)
@@ -46,31 +94,31 @@ Relation argument (outer) token distances (split: neoplasm_train, 350 documents)
| min | 17 | 24 | 22 | 26 | 23 |
| max | 511 | 625 | 459 | 488 | 459 |
-
+
Histogram (split: neoplasm_train, 350 documents)
![rtd_abs-neo_train.png](img%2Frelation_token_distance%2Frtd_abs-neo_train.png)
-
+
Histogram (split: neoplasm_dev, 50 documents)
![rtd_abs-neo_dev.png](img%2Frelation_token_distance%2Frtd_abs-neo_dev.png)
-
+
Histogram (split: neoplasm_test, 100 documents)
![rtd_abs-neo_test.png](img%2Frelation_token_distance%2Frtd_abs-neo_test.png)
-
+
Histogram (split: glucoma_test, 100 documents)
![rtd_abs-glu_test.png](img%2Frelation_token_distance%2Frtd_abs-glu_test.png)
-
+
Histogram (split: mixed_test, 100 documents)
![rtd_abs-mix_test.png](img%2Frelation_token_distance%2Frtd_abs-mix_test.png)
@@ -88,7 +136,7 @@ Relation argument (outer) token distances (split: neoplasm_train, 350 documents)
| min | 14 |
| max | 127 |
-
+
Histogram (split: train, 112 documents)
![rtd_argmicro.png](img%2Frelation_token_distance%2Frtd_argmicro.png)
@@ -106,13 +154,13 @@ Relation argument (outer) token distances (split: neoplasm_train, 350 documents)
| min | 8 | 8 |
| max | 240 | 212 |
-
+
Histogram (split: train, 581 documents)
![rtd_cdcp_train.png](img%2Frelation_token_distance%2Frtd_cdcp_train.png)
-
+
Histogram (split: test, 150 documents)
![rtd_cdcp_test.png](img%2Frelation_token_distance%2Frtd_cdcp_test.png)
@@ -130,7 +178,7 @@ Relation argument (outer) token distances (split: neoplasm_train, 350 documents)
| min | 3 |
| max | 2864 |
-
+
Histogram (split: train, 40 documents)
![rtd_sciarg.png](img%2Frelation_token_distance%2Frtd_sciarg.png)
@@ -148,7 +196,7 @@ Relation argument (outer) token distances (split: neoplasm_train, 350 documents)
| min | 21 |
| max | 277 |
-
+
Histogram (split: train, 60 documents)
![rtd_scidtb-argmin.png](img%2Frelation_token_distance%2Frtd_scidtb-argmin.png)