feat(fetch): add fetching of raw text, pagination and keeping links in the markdown #130

jackadamson · 2024-11-29T15:01:55Z

Description

Add:

Fetching of raw (non-markdownified) text
Pagination of long page responses
Use of the javascript readability implementation of nodejs is present

Server Details

Server: fetch
Changes to: tools

Motivation and Context

The current version cannot handle pages that aren't HTML, meaning that you can't fetch eg. markdown files or raw source code files. So this is added as a fallback, and an argument that can force the behavior (eg. if the model needs access to the raw HTML).

Supporting raw reads can quickly fill up the conversation, so set a maximum length (that the model can override), and allow specifying a start offset. When testing with asking Claude questions about a long documentation page, it paginated through the document until it found the answer and didn't read any further.

The library readabilipy which is used to simplify the HTML before converting to markdown is based on the more mature javascript implementation, by enabling use_readability it will use the NodeJS implementation if node is installed, but fall back to the python implementation. The main benefit is that it preserves links in the markdown, so if an answer isn't found on the page, but it links to where it can be found, the model can make a request for the new URL.

How Has This Been Tested?

Tested in Claude Desktop, including cases that:

Raw fetching:
- Claude still typically uses the non-raw fetch for most cases to get the markdown
- Providing a URL for a non-HTML page implicitly provides the raw text
- If asking Claude about page metadata, it correctly uses the raw argument to get the raw HTML
Readability link viewing:
- Works as before if node.js not installed
- If node.js is installed, links are kept in the markdown
- If you provide Claude a page with a question and the page recommends a different page for answering the question, it will request it instead
Pagination:
- If Claude is asked to answer a question based on documentation, it will continue to fetch more of the document (if it's too long) until it has sufficient information, unless the page provides a link to where to look, in which case Claude will swap to the new page

Breaking Changes

No, there are no changes to the config.

Types of changes

New MCP Server
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Protocol Documentation
My server follows MCP security best practices
I have updated the server's README accordingly
I have tested this with an LLM client
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have documented all environment variables and configuration options

…it's got the information it needs

jackadamson added 8 commits November 28, 2024 18:44

update fetch server to use readability JS if node is installed

467330d

add handling of non-html pages

37622d3

improve error message to model on fetch failure

960321f

add pagination of fetches so models can avoid reading a full page if …

e8dcd29

…it's got the information it needs

add argument to fetch raw html

b6710da

format with black

5552af1

update README to reflect new capabilities

c820086

add doc strings for readabilty and constrain types

ea42a21

dsp-ant approved these changes Nov 29, 2024

View reviewed changes

jackadamson merged commit e0234c7 into main Nov 29, 2024
21 of 23 checks passed

jackadamson deleted the jadamson/fetch-use-readabilityjs branch November 29, 2024 15:09

jackadamson mentioned this pull request Nov 29, 2024

fix(fetch): fix type checking issue from previous change #131

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(fetch): add fetching of raw text, pagination and keeping links in the markdown #130

feat(fetch): add fetching of raw text, pagination and keeping links in the markdown #130

jackadamson commented Nov 29, 2024

feat(fetch): add fetching of raw text, pagination and keeping links in the markdown #130

feat(fetch): add fetching of raw text, pagination and keeping links in the markdown #130

Conversation

jackadamson commented Nov 29, 2024

Description

Server Details

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist