Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QoL] Remove farm-haystack from system #113

Open
JasonLo opened this issue Jan 31, 2024 · 2 comments
Open

[QoL] Remove farm-haystack from system #113

JasonLo opened this issue Jan 31, 2024 · 2 comments

Comments

@JasonLo
Copy link
Collaborator

JasonLo commented Jan 31, 2024

farm-haystack is required for data preprocessing and ingest. But this package is poorly maintained. For example:

  • Still on pydantic v1
  • Always has breaking change
  • Tons of deprecation warnings

We may want to replace it with better package somehow.

@iross
Copy link
Collaborator

iross commented Jan 31, 2024

It looks like they have a haystack v2 beta available that would presumably address most or all of these issues. Looking over the docs, it's not clear if it's a straight swap or if there would be more changes involved.

Are there other comparable pipelining toolsets that could be an appropriate alternative?

@JasonLo
Copy link
Collaborator Author

JasonLo commented Jan 31, 2024

On top of my head: nltk, gensim, and spacy? Not sure which one is better. Let take a look together and decide. Perhaps somewhat related to the encoding problem in Elastic too...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants