You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, I can use http-crawler to tell me about links that return non-20x errors. That could be for two reasons:
The page should exist, and it’s broken (in which case I should fix it)
The page doesn’t exist, and there’s a page with an incorrect link (in which case I should change it)
In the latter case, it’s hard to find the source of the broken link from http-crawler’s current output. It would be useful if it could tell me how it found a given link, so I can check the page that’s providing the link.
(Edited, I rushed the first draft of this issue.)
The text was updated successfully, but these errors were encountered:
Do you have any thoughts about what a good API for this might be?
I’m not sure.
In my fork (commit 0004f24), I decided to just dump a JSON representation of the entire "how seen" tree to disk. That works in a pinch, but it’s not very elegant.
Alternatively, you could subclass Response, then add an extra field how_seen. That’s not ideal either – we may find another link to a page after checking whether it’s live, and there’s no way to go back in time and update that field.
Right now, I can use http-crawler to tell me about links that return non-20x errors. That could be for two reasons:
In the latter case, it’s hard to find the source of the broken link from http-crawler’s current output. It would be useful if it could tell me how it found a given link, so I can check the page that’s providing the link.
(Edited, I rushed the first draft of this issue.)
The text was updated successfully, but these errors were encountered: