-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proper documentation #152
Comments
The examples for the If you want to read the rendered content and now the raw data, you may want to look at If you still can't figure out what to do, join https://type.zulipchat.com/#narrow/stream/209232-pdf |
It is also highly recommend to look at the PDF specification. |
Do you know of any higher-level wrapper or is this maybe planned for this project itself? I am especially interested in extracting text including it's position |
The example from pdf_render above extracts text and its position. |
That already looks very promising after a few tests. Some of the segments it does not detect as a single string and instead spits out separate chars but I think that can be fixed. For now I will probably stay with pdf.js but in the future I might transition to pdf-rs as I already use rust+WASM in some other places. If everything works well it would result in a pure rust alternative to tabula-java ;) |
Yes, there are no attempts to combine separate draw calls. |
Do you plan to eventually open-source it? |
I can't and I don't think it makes sense. This is such an impossible problem that there can only be approximations to a solution and there will be a never ending stream of bugs. |
Hi,
I'm interested in parsing a pdf file to generate some stats based on its content. Your crate seems great, but I don't know how to use it. Could you add some basic exemples in the readme.md?
Thank you!
The text was updated successfully, but these errors were encountered: