Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change display of redirects in dashboard #562

Open
pabs3 opened this issue Sep 7, 2023 · 11 comments
Open

change display of redirects in dashboard #562

pabs3 opened this issue Sep 7, 2023 · 11 comments
Labels
backend dashboard Client-side dashboard only; dashboard server things should use the backend label instead. enhancement pipeline

Comments

@pabs3
Copy link
Contributor

pabs3 commented Sep 7, 2023

Currently when a redirect happens it looks like this in the dashboards:

302 OK https://bugzilla.redhat.com/attachment.cgi?id=461634
200 OK https://bugzilla.redhat.com/attachment.cgi?id=461634

Note that the second URL displayed there is not the URL redirected to, but the original URL.

I propose to display the URL being redirected instead, changing the display to one of these:

302 OK https://bugzilla.redhat.com/attachment.cgi?id=461634 → https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634
200 OK https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634
302 OK https://bugzilla.redhat.com/attachment.cgi?id=461634
 ↳     https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634
200 OK https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634
302 OK https://bugzilla.redhat.com/attachment.cgi?id=461634 ⬎
200 OK https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634
302   OK https://bugzilla.redhat.com/attachment.cgi?id=461634
↳ 200 OK https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634
302 OK https://bugzilla.redhat.com/attachment.cgi?id=461634
↳ 200 OK https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634
302 OK https://bugzilla.redhat.com/attachment.cgi?id=461634
200 OK https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634

They all show the URL redirected to in different ways. The options are to show the new URL twice/once, how/whether to use an arrow to indicate it and how to deal with the extra implied space usage.

As I understand it, the right URLs are recorded in the WARCs, but the backend isn't passing the right information to the dashboard, so that will have to be fixed first.

CC @ShadowJonathan since they expressed interest in helping.

@ShadowJonathan
Copy link

ShadowJonathan commented Sep 7, 2023

Personally i like the

302 OK
↳ 200 OK

approach the most, as it doesnt interfere the existing URLs, but it could be extensible to show a "redirect chain" all at once, by just putting new lines down.

It also doesnt require soft-wrapping, or overflow the horizontal lines.

@JustAnotherArchivist
Copy link
Contributor

But it breaks with concurrency > 1. Or rather, you wouldn't be able to tell which URL redirected where anymore.

@JustAnotherArchivist JustAnotherArchivist added enhancement backend pipeline dashboard Client-side dashboard only; dashboard server things should use the backend label instead. labels Sep 7, 2023
@pabs3
Copy link
Contributor Author

pabs3 commented Sep 7, 2023 via email

@JustAnotherArchivist
Copy link
Contributor

Redirect targets are always processed 'immediately'. (Also, there is no pool or random order, just a queue, although links extracted from an individual page get added to the end of the queue in a random order.)

What I mean is that if you have concurrency 2 and get output of the type

301 https://example.org/foo
301 https://example.org/bar
200 https://example.org/baz
200 https://example.org/boop

you can't know which of /foo and /bar corresponds to /baz. This is impossible with concurrency 1 of course because the redirect is always followed by the redirect target there.

Changing the display based on concurrency could work in principle and would be neat, but it might be a bit messy to implement. I could also see it being unreliable when the concurrency is changed. The way settings changes are applied and communicated isn't exactly intuitive. For example, if a job is 'paused' (slowed down to a very low request rate) and you decrease the delay, this gets reflected immediately in !status output and on a dashboard refresh. But it doesn't actually take effect until the job processes the next URL, notices the settings change, and updates its own copy of the settings. The control node can't know what the effective concurrency is in the meantime; it only knows what it should eventually be.

I don't think omitting redirects entirely on concurrency > 1 is a good idea. Seeing each individual request/response is important for spotting issues with jobs. Also, redirect chains might never complete if the redirect target results in an error or is ignored due to --no-parent, for example.

@TheTechRobo
Copy link

Wouldn't it be possible for the websocket that is sent to the dashboard to include both lines?

@JustAnotherArchivist
Copy link
Contributor

You mean both the redirect response and the redirect target response? Only if you hold back reporting the redirect until completing the chain. So the dashboard wouldn't be showing what's happening in real time, plus there'd be the issues I mentioned in the last paragraph above.

@pabs3
Copy link
Contributor Author

pabs3 commented Sep 7, 2023

A comment from the #archiveteam-dev IRC channel:

<qwertyasdfuiopghjkl> to me, option 1 (302 https://some.example/redirect → https://some.example/url for the redirecting url and later 200 https://some.example/url for the redirect target) makes the most sense, as the arrows on options 3, 4 and 5 would often point to unrelated urls on concurrencies higher than 1, 6 doesn't give any info on what the url redirected to if the redirect target is taking a while to load or concurrency is > 1, and option 2 is a two-line version of option 1 (and would be my second choice after the more compact option 1). Options 1 and 2 are the only ones that are unambiguous in what redirects where, and show the redirect target even if it hasn't loaded yet or (once that gets implemented) if it has been ignored.

@pabs3
Copy link
Contributor Author

pabs3 commented Sep 7, 2023 via email

@pabs3
Copy link
Contributor Author

pabs3 commented Sep 8, 2023

A comment from the #archiveteam-dev IRC channel:

<qwertyasdfuiopghjkl> As a cosmetic suggestion, the → https://some.example/url part should be in a different color than the part before it, so it will be easy to quickly tell it apart from the actual redirect url.

@TheTechRobo
Copy link

You mean both the redirect response and the redirect target response?

No, I meant the redirect response and the destination.

In option 2:

302 OK https://bugzilla.redhat.com/attachment.cgi?id=461634
 ↳     https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634
200 OK https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634

I propose that the messages are grouped like so:

[Message 1]
302 OK https://bugzilla.redhat.com/attachment.cgi?id=461634
 ↳     https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634
[Message 2]
200 OK https://bugzilla-attachments.redhat.com/attachment.cgi?id=461634

That wouldnt cause any concurrency issues.

@JustAnotherArchivist
Copy link
Contributor

Ah, yes, of course. My comment was only about the versions that don't include the redirect target on/after the 30x response but rather only the 30x and then the 200, i.e. options 3 to 6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend dashboard Client-side dashboard only; dashboard server things should use the backend label instead. enhancement pipeline
Projects
None yet
Development

No branches or pull requests

4 participants