Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate request_id #989

Closed
nrllh opened this issue Apr 13, 2022 · 12 comments
Closed

Duplicate request_id #989

nrllh opened this issue Apr 13, 2022 · 12 comments

Comments

@nrllh
Copy link

nrllh commented Apr 13, 2022

I noticed in my dataset that the same request_id was assigned for different requests (although it's rare). This currently means that the request_id in callstacks cannot be clearly assigned.

It is particularly important that I find the right request_id for call stacks. Depending on the timestamp, I could take the first request (after the last request in the callstack), but I'm not sure if it's a reliable solution. Do you have an idea how I can work around the problem?

Here is an example I have in my dataset:

site_id subpage_id url top_level_url method referrer headers is_XHR is_third_party_channel is_third_party_to_top_window resource_type time_stamp is_websocket body etld content_hash is_tracker is_background_req in_scope window_id tab_id frame_id parent_frame_id frame_ancestors request_id triggering_origin loading_origin loading_href req_call_stack post_body post_body_raw url_scope global_uniq_id
47 0 https://contextual.media.net/cksync.php?cs=1&type=vzn&ovsid={{APID}}&redirect=https%3A%2F%2Fpixel.advertising.com%2Fups%2F58222%2Fsync%3F_origin%3D1%26uid%3D%24UID https://www.msn.com/de-de/ GET https://contextual.media.net/checksync.php?&vsSync=1&cs=1&hb=1&cv=37&ndec=1&cid=8HBSKZM1Y&prvid=77%2C117%2C184%2C188%2C203%2C226%2C246%2C2030%2C2033%2C3018&itype=HB-CM&rtime=9&https=1&gdpr=1&gdprconsent=1&usp_status=0&usp_consent=1&dcfp=gdpr,usp [["Host","contextual.media.net"],["User-Agent","Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0"],["Accept","image/avif,image/webp,/"],["Accept-Language","en-US,en;q=0.5"],["Accept-Encoding","gzip, deflate, br"],["Referer","https://contextual.media.net/checksync.php?&vsSync=1&cs=1&hb=1&cv=37&ndec=1&cid=8HBSKZM1Y&prvid=77%2C117%2C184%2C188%2C203%2C226%2C246%2C2030%2C2033%2C3018&itype=HB-CM&rtime=9&https=1&gdpr=1&gdprconsent=1&usp_status=0&usp_consent=1&dcfp=gdpr,usp"],["Connection","keep-alive"],["Cookie","hbcm_sd=1%7C1646673074314; visitor-id=2896746747280784000V10"],["Sec-Fetch-Dest","image"],["Sec-Fetch-Mode","no-cors"],["Sec-Fetch-Site","same-origin"]] 0 1 null image 2022-03-07T19:11:14.410000 0 null media.net null null null null 1 1 2147483652 2147483649 [{"frameId":2147483649,"url":"https://contextual.media.net/medianet.php?cid=8CUT39MWR&crid=715624197&size=306x271&https=1"},{"frameId":0,"url":"https://www.msn.com/de-de/"}] 129 https://contextual.media.net https://contextual.media.net https://contextual.media.net/checksync.php?&vsSync=1&cs=1&hb=1&cv=37&ndec=1&cid=8HBSKZM1Y&prvid=77%2C117%2C184%2C188%2C203%2C226%2C246%2C2030%2C2033%2C3018&itype=HB-CM&rtime=9&https=1&gdpr=1&gdprconsent=1&usp_status=0&usp_consent=1&dcfp=gdpr,usp null null null https://contextual.media.net/cksync.php 192876
47 0 https://ups.analytics.yahoo.com/ups/58222/sync?_origin=1&uid=0000EEA&apid=UP9841187a-9e39-11ec-a345-061779e0c7c0 https://www.msn.com/de-de/ GET https://contextual.media.net/ [["Host","ups.analytics.yahoo.com"],["User-Agent","Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0"],["Accept","image/avif,image/webp,/"],["Accept-Language","en-US,en;q=0.5"],["Accept-Encoding","gzip, deflate, br"],["Referer","https://contextual.media.net/"],["Connection","keep-alive"],["Cookie","A3=d=AQABBLI8JmICEPL2EPXsDfBFliWLBa28-40FEgEBAQGOJ2IwYgAAAAAA_eMAAAcIsjwmYq28-40&S=AQAAAkXJG3i7bt2vymX74kfQ1VQ; B=8rutsllh2cf5i&b=3&s=rs; IDSYNC=18xa~23mh"],["Sec-Fetch-Dest","image"],["Sec-Fetch-Mode","no-cors"],["Sec-Fetch-Site","cross-site"]] 0 1 null image 2022-03-07T19:11:14.939000 0 null yahoo.com null null null null 1 1 2147483652 2147483649 [{"frameId":2147483649,"url":"https://contextual.media.net/medianet.php?cid=8CUT39MWR&crid=715624197&size=306x271&https=1"},{"frameId":0,"url":"https://www.msn.com/de-de/"}] 129 https://contextual.media.net https://contextual.media.net https://contextual.media.net/checksync.php?&vsSync=1&cs=1&hb=1&cv=37&ndec=1&cid=8HBSKZM1Y&prvid=77%2C117%2C184%2C188%2C203%2C226%2C246%2C2030%2C2033%2C3018&itype=HB-CM&rtime=9&https=1&gdpr=1&gdprconsent=1&usp_status=0&usp_consent=1&dcfp=gdpr,usp null null null https://ups.analytics.yahoo.com/ups/58222/sync 193101
47 0 https://pixel.advertising.com/ups/58222/sync?_origin=1&uid=0000EEA https://www.msn.com/de-de/ GET https://contextual.media.net/ [["Host","pixel.advertising.com"],["User-Agent","Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0"],["Accept","image/avif,image/webp,/"],["Accept-Language","en-US,en;q=0.5"],["Accept-Encoding","gzip, deflate, br"],["Referer","https://contextual.media.net/"],["Connection","keep-alive"],["Sec-Fetch-Dest","image"],["Sec-Fetch-Mode","no-cors"],["Sec-Fetch-Site","cross-site"]] 0 1 null image 2022-03-07T19:11:14.585000 0 null advertising.com null null null null 1 1 2147483652 2147483649 [{"frameId":2147483649,"url":"https://contextual.media.net/medianet.php?cid=8CUT39MWR&crid=715624197&size=306x271&https=1"},{"frameId":0,"url":"https://www.msn.com/de-de/"}] 129 https://contextual.media.net https://contextual.media.net https://contextual.media.net/checksync.php?&vsSync=1&cs=1&hb=1&cv=37&ndec=1&cid=8HBSKZM1Y&prvid=77%2C117%2C184%2C188%2C203%2C226%2C246%2C2030%2C2033%2C3018&itype=HB-CM&rtime=9&https=1&gdpr=1&gdprconsent=1&usp_status=0&usp_consent=1&dcfp=gdpr,usp null null null https://pixel.advertising.com/ups/58222/sync 192941
47 0 https://pixel.advertising.com/ups/58222/sync?_origin=1&uid=0000EEA&verify=true https://www.msn.com/de-de/ GET https://contextual.media.net/ [["Host","pixel.advertising.com"],["User-Agent","Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0"],["Accept","image/avif,image/webp,/"],["Accept-Language","en-US,en;q=0.5"],["Accept-Encoding","gzip, deflate, br"],["Referer","https://contextual.media.net/"],["Connection","keep-alive"],["Cookie","APID=UP9841187a-9e39-11ec-a345-061779e0c7c0"],["Sec-Fetch-Dest","image"],["Sec-Fetch-Mode","no-cors"],["Sec-Fetch-Site","cross-site"]] 0 1 null image 2022-03-07T19:11:14.759000 0 null advertising.com null null null null 1 1 2147483652 2147483649 [{"frameId":2147483649,"url":"https://contextual.media.net/medianet.php?cid=8CUT39MWR&crid=715624197&size=306x271&https=1"},{"frameId":0,"url":"https://www.msn.com/de-de/"}] 129 https://contextual.media.net https://contextual.media.net https://contextual.media.net/checksync.php?&vsSync=1&cs=1&hb=1&cv=37&ndec=1&cid=8HBSKZM1Y&prvid=77%2C117%2C184%2C188%2C203%2C226%2C246%2C2030%2C2033%2C3018&itype=HB-CM&rtime=9&https=1&gdpr=1&gdprconsent=1&usp_status=0&usp_consent=1&dcfp=gdpr,usp null null null https://pixel.advertising.com/ups/58222/sync 193016

PS: global_uniq_id is my intern row number.

@vringar
Copy link
Contributor

vringar commented Apr 13, 2022

Hey, this might be due to these request being part of a redirect chain. Iirc during a single redirect the http channel gets reused.
So all of these requests might indeed be triggered by a single call.
Try looking at the response_status in the http_responses and see if that brings up anything.

@vringar
Copy link
Contributor

vringar commented Apr 13, 2022

The http_redirects might be outdates/no longer needed.

@nrllh
Copy link
Author

nrllh commented Apr 13, 2022

Hey, thanks! Yes, it's the case. All of them are redirects.
However, I still wonder what this should mean for callstacks. Is request_id in the table callstacks a reference of the last such request or the first one - based on timestamp?

@vringar
Copy link
Contributor

vringar commented Apr 13, 2022

It's a reference to the entire request chain.
The script creates the first request, which then returns with a redirect status code and kicks off the second request.
So indirectly the script is responsible for both requests, even though it only directly started the first one. So based on timestamp it directly caused the first one but for analysis purposes it might be helpful to create a mapping from callstack to ordered list of redirects.

When we have done such analysis we called those request chains.

@nrllh
Copy link
Author

nrllh commented Apr 13, 2022

Thank you very much, it helped to solve my issue. So I'm closing the issue.

@nrllh nrllh closed this as completed Apr 13, 2022
@nrllh
Copy link
Author

nrllh commented Apr 21, 2022

@vringar sorry for the spam, but I didn't want to create a new issue for that since it's potentially related to this issue:

Problem 1: As I can see, it's not possible to correlate the requests in call_stack row (in the callstacks table) with the an ID directly. I guess the only option is to compare strings and hope to get the right request id. If there are multiple records with the same request URL, it's very hard to find the right request_id for the requests that appear in call_stack.

Problem 2: Another problem I face is how can I determine which request triggered the next one. As long as I could observe the sequence of requests is either top-down or bottom-up. Here an example:

 instrumentFunction/<@https://space.bilibili.com/7584632:362:25;null
value@https://s1.hdslb.com/bfs/seed/log/report/log-reporter.js:1:30329;null
value@https://s1.hdslb.com/bfs/seed/log/report/log-reporter.js:1:23700;null
value@https://s1.hdslb.com/bfs/seed/log/report/log-reporter.js:1:23299;null
value@https://s1.hdslb.com/bfs/seed/log/report/log-reporter.js:1:22815;null
value@https://s1.hdslb.com/bfs/seed/log/report/log-reporter.js:1:22575;null
value@https://s1.hdslb.com/bfs/seed/log/report/log-reporter.js:1:100310;null
o@https://s1.hdslb.com/bfs/static/jinkela/space/space.ff495225cc805974552c20fc851f8da0f2cd085a.js:1:51142;null
videoExposureReport@https://s1.hdslb.com/bfs/static/jinkela/space/11.space.ff495225cc805974552c20fc851f8da0f2cd085a.js:1:27800;null
770/mounted/</<@https://s1.hdslb.com/bfs/static/jinkela/space11.space.ff495225cc805974552c20fc851f8da0f2cd085a.js:1:27070;null
value@https://s1.hdslb.com/bfs/seed/log/report/log-reporter.js:1:23700;null
sentryWrapped@https://s1.hdslb.com/bfs/static/jinkela/long/js/sentry/sentry-5.2.1.min.js:2:37520;null

Problem 3: As you can see
the URL https://s1.hdslb.com/bfs/seed/log/report/log-reporter.js appears in different sequences. How should I interpret that?

Thank you very much in advance!

@nrllh nrllh reopened this Apr 21, 2022
@vringar
Copy link
Contributor

vringar commented May 5, 2022

Hey,

  1. I'm sorry I don't quite understand this problem. Which ID do you want to correlate it to? The request_id? You can use the request_id to correlate with a redirect chain. Where the first element in the redirect chain is the URL originally called and the last one is the URL which returned with some but a 3XX status code. What other correlation do you want?
  2. I don't think you can determine which request triggered which other one. The callstack is bottom-to-top.
    So the first function called is sentryWrapped which then calls value which calls 770/mounted/</< however that name came about.
  3. This is because the script is calling other functions in the same script. I'm assuming they are all called value because they are all function objects or whatever the minifier produced. And the call from other things might end up back in the script due to callbacks or smt.

@englehardt
Copy link
Collaborator

englehardt commented May 18, 2022

1/ I think the level of tracing you want to do is just not possible with the instrumentation we have in place right now. The stacks we save come directly from the browser; we don't have a way to label which script URL listed in the stack corresponds to which webRequest ID. That would require a bunch of plumbing throughout the browser to trace properly. Note that if you link a call stack table row back to a web request, then you know which JS context that call is executing. So this is only a problem when there are multiple copies of a script executing in a same exact context (which does happen).

2&3/ it sounds like you might be confusing call stack with HTTP redirects? Like Stefan mentions the call stack shows calling relationships between scripts which are executing in the same JS context, not a series of requests. So scripts can call into each other (or use methods defined in one another).

@nrllh
Copy link
Author

nrllh commented May 19, 2022

Thank you very much, I had some difficulties for understanding the callstacks, but now it's clear.

Not sure if I create an issue, but I can't see for all HTTP redirects their DNS responses. It seems we have only the final request's DNS response of request chains. That means, probably we are missing some data for redirect chains in the table dns_responses.

@englehardt
Copy link
Collaborator

I noticed that DNS issue myself and filed #1020 for it.

@wesley-tan
Copy link

Hi there! I am an undergraduate researching into browser fingerprinting.
So, ultimately,

  1. What is the difference between id and request_id?
  2. How are request_id grouped?

@nrllh
Copy link
Author

nrllh commented Jun 10, 2024

Hi there! I am an undergraduate researching into browser fingerprinting. So, ultimately,

1. What is the difference between id and request_id?

2. How are request_id grouped?
  1. The id is the row number, which increases independently of request_id or visit_id. The request_id is the ID of HTTP requests, and it resets after each visit.
  2. The data is grouped by visit_id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants