Track original triple IDs in KGDataset.from_triples #37

AlCatt91 · 2024-01-03T19:18:35Z

When creating datasets with random train/valid/test splitting with KGDataset.from_triples and KGDataset.from_dataframe, for downstream tasks (e.g. comparing predictions with edge graph statistics) it can be useful to keep track of the shuffling and splitting of the original triple IDs in the (n_triple, 3) array/df.

…triples

danjust

I agree that this is useful to have. Just for consistency I'd prefer having original_triple_ids in every case

danjust · 2024-01-08T09:29:39Z

besskge/dataset.py

            n_entity=data[:, [0, 2]].max() + 1,
            n_relation_type=data[:, 1].max() + 1,
            entity_dict=entity_dict,
            relation_dict=relation_dict,
            type_offsets=type_offsets,
            triples=triples,
        )
+        ds.original_triple_ids = triple_ids  # type: ignore


It might be more consistent for downstream analytics if original_triple_ids always was a member of the dataset. I'd add a dict with values arange(...) if the dataset is not created through from_triples

Agree, good point, thanks! I should have added this to every constructor

danjust

Thanks Alberto! LGTM

AlCatt91 added 2 commits January 3, 2024 19:13

track original triple IDs when shuffling/splitting in KGDataset.from_…

ae821e4

…triples

lint fix

6514372

AlCatt91 requested a review from danjust January 3, 2024 20:06

danjust reviewed Jan 8, 2024

View reviewed changes

add original_triples_id to every constructor

74ffbb6

danjust approved these changes Jan 8, 2024

View reviewed changes

AlCatt91 merged commit 4e794d1 into main Jan 8, 2024
1 check passed

AlCatt91 deleted the track_triple_id_split branch January 12, 2024 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track original triple IDs in KGDataset.from_triples #37

Track original triple IDs in KGDataset.from_triples #37

AlCatt91 commented Jan 3, 2024 •

edited

Loading

danjust left a comment

danjust Jan 8, 2024

AlCatt91 Jan 8, 2024

danjust left a comment

Track original triple IDs in KGDataset.from_triples #37

Track original triple IDs in KGDataset.from_triples #37

Conversation

AlCatt91 commented Jan 3, 2024 • edited Loading

danjust left a comment

Choose a reason for hiding this comment

danjust Jan 8, 2024

Choose a reason for hiding this comment

AlCatt91 Jan 8, 2024

Choose a reason for hiding this comment

danjust left a comment

Choose a reason for hiding this comment

AlCatt91 commented Jan 3, 2024 •

edited

Loading