feat(fetch): add fetching of raw text, pagination and keeping links in the markdown #130
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Add:
Server Details
Motivation and Context
The current version cannot handle pages that aren't HTML, meaning that you can't fetch eg. markdown files or raw source code files. So this is added as a fallback, and an argument that can force the behavior (eg. if the model needs access to the raw HTML).
Supporting raw reads can quickly fill up the conversation, so set a maximum length (that the model can override), and allow specifying a start offset. When testing with asking Claude questions about a long documentation page, it paginated through the document until it found the answer and didn't read any further.
The library readabilipy which is used to simplify the HTML before converting to markdown is based on the more mature javascript implementation, by enabling
use_readability
it will use the NodeJS implementation if node is installed, but fall back to the python implementation. The main benefit is that it preserves links in the markdown, so if an answer isn't found on the page, but it links to where it can be found, the model can make a request for the new URL.How Has This Been Tested?
Tested in Claude Desktop, including cases that:
Breaking Changes
No, there are no changes to the config.
Types of changes
Checklist