You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
We usually query a document word-by-word. However, docnet only supports character-oriented queries. Character-oriented queries are really cool since users can build word-oriented queries based on them. However, I believe it will be better if this common requirement could be implemented in docnet package.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Thus, there is a need for GetWords function to return a list of words. Each word model has the location box and the text information, just like GetCharacters.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
The main reason why GetWords is not supported is because PDF documents have no concept of words. We expose all the info about the characters that one needs to do business logic specific clustering and so on. I am reluctant to add any sort of clustering to the core library itself because there will always be edge cases either due to document formatting or text direction and so on.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
We usually query a document word-by-word. However,
docnet
only supports character-oriented queries. Character-oriented queries are really cool since users can build word-oriented queries based on them. However, I believe it will be better if this common requirement could be implemented indocnet
package.Describe the solution you'd like
A clear and concise description of what you want to happen.
Thus, there is a need for
GetWords
function to return a list of words. Each word model has the location box and the text information, just likeGetCharacters
.Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: