index binary documents such as PDF, MS Office. #164
Replies: 4 comments 4 replies
-
Hello! You cannot push binary files to MeiliSearch at the moment. You have to extract the text from your file and push the content into MeiliSearch! @gmourier, for the new feature request 😇 |
Beta Was this translation helpful? Give feedback.
-
Hello @ShubjeetPal 👋 Thanks for your feedback. I've moved your initial issue as a product discussion so that other users can vote and interact directly here about this feature proposal. A possible workaround could be to extract the text and index it within MeiliSearch but there is a limit on the number of words that can be searched within an attribute. This would force the text to be split into several attributes which is probably not ideal. |
Beta Was this translation helpful? Give feedback.
-
@ShubjeetPal The best open source product I've found to extract text from binary documents is Apache Tika: https://tika.apache.org/ It supports text extraction from tons of binary formats such as PDF, Word, etc: https://tika.apache.org/2.0.0/formats.html I use the Then for every binary file I want to extract text out of I do:
It spits out plain text which I then feed into meilisearch. |
Beta Was this translation helpful? Give feedback.
-
This is still my biggest issue. I have lots of PDF, MS Word, etc. documents that I need to be able to search and index. I understand how I could convert them to plain text documents, but that leaves me with two problems:
|
Beta Was this translation helpful? Give feedback.
-
How can I use MeiliSearch to index binary documents such as PDF, Open Office, MS Office in a react Static Web Application where CMS is Strapi.
Any suggestions/ plugins will be helpful
Beta Was this translation helpful? Give feedback.
All reactions