-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using navigationStart as a baseline may expose cross-origin timing information #160
Comments
Also @bdekoz @mikewest @arturjanc |
If "current origin redirect chain" means the origin boundary is not crossed therein, that option seems reasonable to me. The current model does indeed seem problematic. w3c/resource-timing#220 is also related. |
Yes, it means something like "after all the redirects that are not same-origin as the document's final origin are complete" |
This does need fixing. I oppose the first option. The third option seems most like what we do with other cross-origin timing exposure. |
I think I agree with @annevk and @achristensen07. This does seem like something we ought to change, and the third option seems like the most robust (and consistent) way of doing so. I think the second is justifiable from a security standpoint as well, but I'm not sure the complexity it introduces is worthwhile. |
Note that (3) includes (2) inside it, in the cases where TAO headers are not there. With option 3 I want to be careful when we overload the meaning of TAO (see concerns here). Though it could be that since TAO was ignored inside redirects so far it's not a problem since it's new usage of an existing header. Perhaps a good way to go about it would be that the TAO Header would have to specify the same origin as the
Not having the above in a cross-origin redirect would push the navigation start time (and |
As I noted in w3c/resource-timing#220 accounting for TAO here is in essence a new model and therefore also a source of complexity. For Resource Timing we have a document A that fetches B which redirects to C. B and C need to consent. Here we have a document A that is navigated to B which redirects to C. B needs to consent? I think it's acceptable, but it's quite a bit different. |
I have to say that the second and third options are not mutually exclusive - we could set navigation time at the first same-origin redirect and extend it backwards pending on an opt-in from the cross-origin redirects. Note that in any case, we'd need some point in time for navigation start even if there's no opt-in. At worst, we can have that be the request to the eventual request for the document, but setting it to the first request in the last same-origin portion of the redirect chain doesn't seem overly complex. |
I agree that TAO may not be the opt-in we want here. |
Opting for a new opt-in? 😬 |
Yoav pointed out that in cases where the redirection domain is trying to help the source and destination track the user, and the user agent has blocked the query parameters they had been using to do this, this timing information might help them continue to transfer at least a partial identifier. I don't think the navigational-tracking threat model (privacycg/nav-tracking-mitigations#12) is developed enough to be much help in making decisions here, but heads up that the Privacy CG might come back to this later. |
Hmm this makes it more interesting - it means that if we accept this as a threat to be mitigated, the user agent should have a say in this for the purpose of tracking prevention and not just the two domains, which means something along option (2) would be the (only?) way to go (the "navigation" starts from the last same-origin chain). Note that if we go with option (2), some value will be lost for RUM. Sites that load "slowly" will only know what happened from the point the redirect chain arrived at their domain, and they would have no insight into delays caused by 3rd party redirects. |
Indeed. As long as query parameters are allowed to be passed with the navigation URLs, acting against this doesn't matter much. So for now, I don't think we should take that into account. But if and when we start mitigating query parameters as an information-passing channel, we'd need to also mitigate the redirection timing channel, either by not exposing it entirely, or have browsers lie about those times in smarter ways (e.g. for known trackers, when they're highly variable, etc). |
There are user agents doing experiments around query parameters (and some might have shipped?) so we might as well account for it now. |
I agree this is an area we should remain vigilant on to see how it develops, but I don't believe we have consensus on the threat model and what the solutions to this threat would look like. So it seems premature to e.g. eliminate an opt-in option before that settles. |
Maybe, I have to say that since we have some tentative plans in this area, I'm actually hesitant now to support an opt-in model here. |
^^ @miketaylr It's true that we could go with option (2) and expand it later with an opt-in, once things in that area settle. It may also be interesting to think about the incentive model here - if this would make redirectors unaccountable for their performance, that'd not be a great outcome. An opt-in model may not be effective in driving such accountibility. |
Feels to me from this discussion that this might be the way forward as a first step, while contemplating the opt-in. Interesting if something else would come up in TPAC.
I'm not sure a redirecting URL is accountable for timing to its destination... Maybe it's accountable to the domain that started the navigation, e.g. where the banner was? Maybe the interested party in this information is neither domain, but rather the user (and the user agent), and user agents should be encouraged to show some UI indication during a cross-origin navigation redirect ("You are now redirected via Outbrain" or such), to show the user that the delay comes from an ad broker etc and not from the originating domain / destination domain, rather than counting on the origin/destination URLs to do something about it? </Thoughts.> |
Just to clarify, sounds like this applies to https://w3c.github.io/hr-time/#dfn-time-origin as well? If it does then this would be a pretty big change impacting any high resolution timestamps received by developers. |
Indeed! We'll definitely have to be careful about rolling this out. |
See demo here. A minimal use case without an intermediate domain. The originating domain does some form processing of |
Hey folks! As someone working in performance full-time, I have some concerns regarding this proposal, and its impact on well-established metrics like TTFB. Unless I'm misunderstanding the proposal, wouldn't option 2 ("Change navigationStart to be the timestamp of the first redirect in the current origin redirect chain") mean that we would be redefining TTFB and all metrics that build upon it to mean different things on different situations? That is:
This seems inconsistent and difficult to account for, given that RUM libraries don't have any visibility into the HTTP headers on the document. Furthermore, would this also change the definition of TTFB and all metrics that build upon it for native browser measurements, such as the ones taken for the Chrome User Experience Report? If not, this could be even worse, as it would remove the last bit of visibility we have into what happens before a request gets to the ultimate origin. It's extremely important for us to be able to account for every portion of the time that goes into TTFB or a higher-level metric, when we're being ranked for it via CrUX. In general, Navigation Timing is a well established API that is relied upon by every RUM library out there, so it seems dangerous to redefine the meaning of the most fundamental value that the entire API relies on. |
Thank you @sgomes, yes, this voice has to be heard to. |
It's unclear to me how excluding redirect time in the timeOrigin prevents any realistic attack. For this attack to work today, we need:
Suppose we exclude redirect time from the timeOrigin. For the above attack to work, I already need users on a site I own. If I then modify links on my site to:
then I'm still able to measure the redirect time and execute the attack. |
|
I apologise, I should have provided more context. My concern is ensuring that these cross-origin issues are visible when they happen within the context of the same organisation, or a context of trust, in general. I understand that's difficult to define. |
Thank you, I can understand the distinction with your explanation 👍 I'm not sure there's a good solution to meet all of the concerns expressed. The proposal we're discussing would make some categories of performance problems undiscoverable to the organisations that would be able to fix them, but I don't see a good mechanism for preserving that ability in the context of the current proposal. |
@sgomes I am curious how could this be applied? Does "within the context of the same organisation" change the difficulty on making such measurement (e.g. in contrast to two non-related websites)? Without having some backend information, it's hard to imagine that this is indeed that simple in the current state. With backend information at the other hand, it's easier and without calling the client side API - you just compare two timings on your side (well, you have to somehow fingerprint the initial request). |
Do you mean that if we A adds a link to B with |
I think that would work, with maybe some |
@yoavweiss this would work basically the same as "opener" (or however it is done nowadays) to preserve the And I think we could do that because similarly to |
@terjanq My concern is mostly around discoverability. In the current scenario, we can discover problematic pages at the RUM level by simply looking for high values of In a scenario where cross-site information is not available to RUM libraries and there exists no mechanism to retrieve it, we would need to rely exclusively on HTTP-level server logs for the discovery phase as well. Not only that, we would have to ensure that the logs for all systems involved are present in the same location, that requests can be correlated, and that an understanding of flow can be established so that we could calculate for each flow what the timing of the first request and the timing of the final one were. It's theoretically doable, yes, but non-trivial in the context of logging systems that are designed to handle each request as a separate entry. Not to mention much more difficult to scale, given the complexity involved. Hope this provides some extra context! |
Makes sense. I am not an expert, but isn't this actually problematic for measurements as well? Suppose that you indeed rely on these measurements and now user visits your website from another website that use redirectors, and these redirectors tend to be really slow. Doesn't it provide false signal to your website? Normally links clicked within the same organisation would be faster than from another in the example. For me, it looks like both parties in this discussion would benefit from fixing this. For your case it would always provide the wanted signal via opt-in, and from the security & privacy it would mask the potential leaks. |
Yes, I agree. The information is ambiguous in the status quo, and we're only able to distinguish between problematic external redirectors and internal ones by manually looking at things case-by-case. That is sufficient for some needs, but not others. If an opt-in mechanism were to exist for unambiguously determining that a user spent too much time being redirected across an organisation's various hosted origins, then that would be an improvement over the status quo, as long as the mechanism is practical to deploy. |
I see such opt-in to be similar in spirit to referrer policy, e.g. |
I see such opt-in to be similar in spirit to referrer policy, e.g. |
I don't think Firefox would implement such a mechanism as it enables precisely the thing we want to avoid. (See also statements upthread.) Having sites move things into the URL would be vastly preferable and would ultimately allow for tackling this as part of https://github.com/privacycg/nav-tracking-mitigations (if sites indeed decide to go to those lengths). |
In the meantime, added a PR to move time origin computation to HTML, this would hopefully make it easier to see how it works and make changes in this in the future. |
Metrics such as LCP are trying to represent the user's experience and if navgiationStart is changed to be before the first redirect for the current origin then it will no-longer represent the user's experience for many cases Visitors clicking from search engines / social media sites are typically routed through redirect, in the worst case clicking on a Ad in Google results leads to three redirects that consume nearly a second for example The time before the first redirect on the current origin is important as it reflects the visitors actual experience |
Understood, but nonetheless the cross-domain leak is there, and the fact that removing it would hinder this valuable metric doesn't make it less of a leak. |
@andydavies - I think the use case for having visibility to these cross-origin redirects is clear. At the same time, this is a cross-origin leak, as it is exposed right now. We'd need to find an alternative, privacy-preserving way to expose this information to developers. (e.g. aggregated reporting) |
Suggesting to start tackling this by adding an opt-in. How about: At first, it would allow exposing redirectStart/redirectEnd when there are cross-origin redirects in navigation, and later on we can make this as the only way to expose cross-origin redirects at start-of-navigation. |
Suggesting to solve this in the context of w3c/resource-timing#220 |
I don't see how an opt-in header helps with the attacks? See also #160 (comment). |
You're right, it doesn't help with that aspect. |
When we have a navigation with cross-origin redirects, we're hiding
redirectStart
andredirectEnd
from the final document.However, because the
timeOrigin
for all the navigation timing entries is the navigation start, the redirect timing info can (somewhat) easily be inferred.Consider the following:
ts1
(e.g. a search engine click handler URL or an ad broker like outbrain)ts2
ts1
is available to the document, directly or indirectly, as it's thenavigationStart
which is the base timestamp for all navigation timing / resource timing entries (as well as thetimeOrigin
).I believe we have three ways to go about it (but maybe there are more):
navigationStart
to be the timestamp of the first redirect in the current origin redirect chainTAO
(in its current form or with some amendments) to give redirect chains the opportunity to expose their timing to the destination.This came from discussing whether to enable or zero-out navigation timing properties.
See previous discussions here, here and here.
Thoughts?`
The text was updated successfully, but these errors were encountered: