No way to tell where a broken link is linked from #10

alexwlchan · 2017-02-17T16:46:27Z

Right now, I can use http-crawler to tell me about links that return non-20x errors. That could be for two reasons:

The page should exist, and it’s broken (in which case I should fix it)
The page doesn’t exist, and there’s a page with an incorrect link (in which case I should change it)

In the latter case, it’s hard to find the source of the broken link from http-crawler’s current output. It would be useful if it could tell me how it found a given link, so I can check the page that’s providing the link.

(Edited, I rushed the first draft of this issue.)

inglesp · 2017-02-26T09:43:45Z

Hey @alexwlchan. I agree this would be useful. Do you have any thoughts about what a good API for this might be?

alexwlchan · 2017-02-26T17:38:57Z

Do you have any thoughts about what a good API for this might be?

I’m not sure.

In my fork (commit 0004f24), I decided to just dump a JSON representation of the entire "how seen" tree to disk. That works in a pinch, but it’s not very elegant.

Alternatively, you could subclass Response, then add an extra field how_seen. That’s not ideal either – we may find another link to a page after checking whether it’s live, and there’s no way to go back in time and update that field.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No way to tell where a broken link is linked from #10

No way to tell where a broken link is linked from #10

alexwlchan commented Feb 17, 2017 •

edited

Loading

inglesp commented Feb 26, 2017

alexwlchan commented Feb 26, 2017 •

edited

Loading

No way to tell where a broken link is linked from #10

No way to tell where a broken link is linked from #10

Comments

alexwlchan commented Feb 17, 2017 • edited Loading

inglesp commented Feb 26, 2017

alexwlchan commented Feb 26, 2017 • edited Loading

alexwlchan commented Feb 17, 2017 •

edited

Loading

alexwlchan commented Feb 26, 2017 •

edited

Loading