Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataDome is able to detect zendriver #20

Open
3 tasks
aartoni opened this issue Nov 18, 2024 · 10 comments
Open
3 tasks

DataDome is able to detect zendriver #20

aartoni opened this issue Nov 18, 2024 · 10 comments
Labels
bot-detection enhancement New feature or request

Comments

@aartoni
Copy link

aartoni commented Nov 18, 2024

I've tried both the standalone and the containerized version but it seems that this DataDome-protected website is still able to detect zendriver somehow.

The good news is that we can check Device and Browser Info for an hint on why DataDome may be flagging zendriver as a bot. For context, that website is made by the VP of Research at DataDome.

In the past I've been able to get false on the hasInconsistentGPUFeatures check adding the following command line flags to Chromium in nodriver:

browser_args = ["--enable-unsafe-webgpu", "--enable-features=Vulkan"]

Having Vulkan in the container would be nice, but probably not needed to pass this test. I'm saying that because I'm running a Vulkan-enabled driver for my AMD GPU and can't pass the check even when running outside the container with the flags above. Unfortunately this setup was enough to pass the check ~10 days ago, so it seems that something has changed recently.

Observations

I've been looking for another way and want to share some observations:

  • the check uses navigator.gpu.requestAdapter() to retrieve the GPU information;
  • if we only specify --enable-unsafe-webgpu without Vulkan, then try to run navigator.gpu.requestAdapter(), Chromium prints the following warning: WebGPU on Linux requires GLES compat, or command-line flag --enable-features=Vulkan, or command-line flag --enable-features=SkiaGraphite (and skia_use_dawn = true GN arg), that means that we may not need Vulkan;
  • I'm able to pass the check on Librewolf using the user.js from arkenfox.

Handling this

If you're interested in providing support for this issue, I'd say that we should make child issues to:

  • add Vulkan support to the container: very unlikely to solve the issue, but nice-to-have for swayvnc-chrome;
  • enable GLES compat (may solve the issue);
  • return Librewolf's GPU object when requested (last resort, can we mock this kind of data using zendriver?);
@aartoni
Copy link
Author

aartoni commented Nov 18, 2024

Here is the diff between the request fields in the fingerprint_bot_test request bodies for future reference (this request is performed by Device and Browser Info):

// Containerized Chromium
"webGLVendor": "Google Inc. (AMD)",
"webGLRenderer": "ANGLE (AMD, AMD Radeon 760M (radeonsi gfx1103_r1 LLVM 19.1.3 DRM 3.59 6.11.7-artix1-1), OpenGL ES 3.2)",
"webGLCanvasHash": "...omitted...",

// Librewolf + user.js from arkenfox
"webGLb64Value": "NA",
"webGLError": "TypeError: p is null",
"webGLVendor": "NA",
"webGLRenderer": "NA",
"webGLCanvasHash": "...omitted...",

@therealpurplemana
Copy link

My understanding is these values can't be faked properly currently. Even Camoufox which forges os/platform/fingerprints at the C++ level doesn't properly forge the WebGL/GPU context.

I have a custom script that can inject browser hints into Chrome using the v3 manifest API but it's still detected as a forged user agent by browserscan. Other sites are fooled completely.

If you're on discord, hit me up (anxman).

@stephanlensky
Copy link
Owner

Do you know why these values can't be forged properly? I assumed that if running a real browser (especially not in headless mode), everything would check out (no need to even be faked). I'm not sure how it's different than just launching the browser normally.

I'm not very familiar with this subject though.

@therealpurplemana
Copy link

Do you know why these values can't be forged properly? I assumed that if running a real browser (especially not in headless mode), everything would check out (no need to even be faked). I'm not sure how it's different than just launching the browser normally.

I'm not very familiar with this subject though.

Yeah, so Chrome introduced "Browser Hints" -- which are different from "Client Hints" (ie: Javascript). The Browser Hints are sent at the network level in the headers from Chrome and include information like the Platform OS. You can see the two side by side here: https://www.whatismybrowser.com/detect/client-hints. It is possible to fake all of these values using a v3 Chrome Plugin; however, Browserscan detects these as being forged. On the flip side, I find that it's sufficient to get through Cloudflare for certain sites whereas it will block Linux if detected.

Some of the modern methods from Datadome and Cloudflare IAUM/WAF use the browser fingerprints for GPUs. Even Camoufox is unable to forge those currently.

@stephanlensky
Copy link
Owner

So the main issue then is just that they are blocking Linux? I think the "Browser Hints" for automated Chrome vs. regular Chrome would be the same, right?

So, I think we'd see the same issues with blocking if you just launch a regular browser on Linux and manually navigate to the site, correct?

What I mean is that if I understand right it's not so much that the automation is getting detected, they're just taking an extremely overzealous approach to blocking people.

@therealpurplemana
Copy link

therealpurplemana commented Nov 18, 2024

So the main issue then is just that they are blocking Linux? I think the "Browser Hints" for automated Chrome vs. regular Chrome would be the same, right?

So, I think we'd see the same issues with blocking if you just launch a regular browser on Linux and manually navigate to the site, correct?

What I mean is that if I understand right it's not so much that the automation is getting detected, they're just taking an extremely overzealous approach to blocking people.

Yes, that's the main issue I am encountering. I deploy zen driver in a Linux container.

@stephanlensky
Copy link
Owner

Got it, yeah just trying to narrow down if this is actually an issue with our automation getting detected or just a side-effect of running it on Linux.

It sounds like there are maybe two issues here:

  1. Some sites block Linux for no real reason
  2. Container-based deployments currently don't support all GPU features (e.g. Vulkan) that a regular browser normally would, which allows detection of Zendriver

I think we should try to find a workaround for problem 2, but problem 1 sounds like it won't be fixable without using an entirely custom browser (like Camoufox).

@aartoni
Copy link
Author

aartoni commented Nov 19, 2024

It may be that they're just blocking Linux but it may also be that there's a Linux configuration that makes it possible to not be flagged as bots on some websites (e.g., the GLES compat configuration suggested by Chrome) that may have opted for a light version of the DataDome protection.

That said, it's probably better if we start by making a Windows container image for zendriver and mark this issue as staled until we have further knowledge on how to at least fix the WebGL issue.

@stephanlensky let me know if you prefer to close this issue and want me to open an issue to track the search for a working WebGL configuration and another one for creating the Windows version of Zendriver Docker. Thanks for now!

@stephanlensky
Copy link
Owner

stephanlensky commented Nov 19, 2024

@aartoni I think we can leave this issue open to track ongoing problems with DataDome.

If you wouldn't mind, it would also be great if you could create those additional issues you mentioned:

  • Create Windows version of zendriver-docker (probably this would be an entirely separate project, something like zendriver-docker-windows)
  • Add OpenGL ES support to zendriver-docker
  • [optional] Add Vulkan support to zendriver-docker

I don't anticipate having time to work on any of these anytime soon, but I'd be happy to review any PRs to zendriver-docker in the meantime.

If you'd like, I can also set up a new zendriver-docker-windows repo in case you'd like to make some contributions there.

@stephanlensky stephanlensky added bot-detection enhancement New feature or request labels Nov 19, 2024
@aartoni
Copy link
Author

aartoni commented Nov 20, 2024

@stephanlensky I have an idea, I'll create the repo myself and transfer ownership to you when it reaches beta state. This way you don't have to review early contributions. I'll use AGPL-3.0 like you did for the other one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot-detection enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants