Replies: 1 comment 1 reply
-
Hi @ugm2 thank you for sharing this insight. Yes, for the integration of external datasets from BEIR, we use the file format. The alternative of storing a preprocessed version of these files as Haystack Documents online isn't really feasible and also it would make the process to add more datasets to BEIR more complex. However, I understand that unfortunately this creates overhead in your use case. If you are interested, two pointers I could give are one test case with a classification node in the indexing pipeline: and an exemplary YAML file containing multiple different indexing pipelines: https://github.com/deepset-ai/haystack/blob/797c20c966fe46308f646e02d662ca87155a9d4a/test/samples/pipeline/test.haystack-pipeline.yml |
Beta Was this translation helpful? Give feedback.
-
Hi Community! 😊
I was currently trying to use the
eval_beir()
functionality to start comparing different pipelines and found that the documents from the chosen datasets are stored as files and then passed to theindexing_pipeline.run()
:haystack/haystack/pipelines/base.py
Lines 2252 to 2262 in 60f678e
This is a problem, I think, because normally (at least in my case) I don't use files as input but Haystack.Document type.
It's a little bit of an overhead having to manually attach a
TextConverter
node every time you want to evaluate a pipeline, IMOBeta Was this translation helpful? Give feedback.
All reactions