Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python undetected chromedriver #316536

Conversation

liam-murphy14
Copy link
Contributor

Description of changes

Add undetected-chromedriver to python3Packages. Homepage: https://github.com/UltrafunkAmsterdam/undetected-chromedriver

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@ToasterUwU
Copy link
Member

Result of nixpkgs-review pr 316536 run on x86_64-linux 1

2 packages failed to build:
  • python312Packages.undetected-chromedriver
  • python312Packages.undetected-chromedriver.dist
2 packages built:
  • python311Packages.undetected-chromedriver
  • python311Packages.undetected-chromedriver.dist

@ToasterUwU
Copy link
Member

The Error i get when trying to build this:

Executing pythonImportsCheckPhase
Check whether the following modules can be imported: undetected_chromedriver
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 1, in <lambda>
  File "/nix/store/c7ycrgwv039nqglbif98yggx211sdbcl-python3-3.12.3/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/nix/store/hylyqz7l10p837mwryw0ib0c98xfqv44-python3.12-undetected-chromedriver-3.5.5/lib/python3.12/site-packages/undetected_chromedriver/__init__.py", line 44, in <module>
    from .patcher import IS_POSIX
  File "/nix/store/hylyqz7l10p837mwryw0ib0c98xfqv44-python3.12-undetected-chromedriver-3.5.5/lib/python3.12/site-packages/undetected_chromedriver/patcher.py", line 4, in <module>
    from distutils.version import LooseVersion
ModuleNotFoundError: No module named 'distutils'

Seems like distutils is missing and therefor this cant function. This is is because Python 3.12 has removed it, after it was deprecated in 3.10

Of note, the distutils package has been removed from the standard library.

I dont know what exactly requires this, the check or the module or the nix derivation, and i dont really have time rn to get into it. But i can definitely say this doesnt work rn, and distutils missing is the cause

@ToasterUwU
Copy link
Member

@ToasterUwU
Copy link
Member

@liam-murphy14 I think for now, the only solution is to not provide this package for Python 3.12, at least until this is fixed upstream.

So 3.11 only, or maybe some older versions as well.

@liam-murphy14
Copy link
Contributor Author

Hey @ToasterUwU, thanks for checking this out - I run 3.11 so didn't even notice the issue. For now going to see if adding setuptools to propagatedBuildInputs will fix it on our end, otherwise will restrict to Python3.7-3.11 only.

As for long term fix, this PR against upstream seems to follow the PEP guidance, so hopefully it gets merged soon.

Thanks again ! Either way will update this PR with a fix in the next hour or so.

@liam-murphy14 liam-murphy14 force-pushed the add-python-undetected-chromedriver branch from a640dbf to 1a2037c Compare June 2, 2024 17:44
@liam-murphy14
Copy link
Contributor Author

Update: the setuptools fix did not work, so for now removing support for Python 3.12. Once upstream is fixed, I will update to add support.

Thanks again to @ToasterUwU for finding the issue !

@ToasterUwU
Copy link
Member

Result of nixpkgs-review pr 316536 run on x86_64-linux 1

2 packages built:
  • python311Packages.undetected-chromedriver
  • python311Packages.undetected-chromedriver.dist

Copy link
Member

@ToasterUwU ToasterUwU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, i tested this properly now, and i have to take the approval back.

Everytime i try to use this, i get an error. As soon as the line comes up where i create a Chrome() instance, it fails with some error code. Details in a second. Specifically it fails at the part where undetected-chromedriver executes the init method of the selenium Chrome class itself via super()

The error code i get every single time, even when using undetected chromedriver correctly with freeze_support() and everything: undetected_chromedriver unexpectedly exited. Status code was: 127

This seems to be a error coming from the chromedriver itself.

@liam-murphy14
Copy link
Contributor Author

Hey @ToasterUwU, hmmm that is odd, when I tested this before submitting PR all was working fine. I have seen that 127 error before when the chromedriver and chrome/chromium instance have a version mismatch, so possibly related to that ?? I will do some testing and see if I can fix it.

Thanks for all the very in depth reviews, appreciate your time and effort !!

@ToasterUwU
Copy link
Member

@liam-murphy14 I just double checked. I suspected the same at first, but now i made sure.
Both the chromedriver coming with this package and the chromium install i have are on version 126, so even tho it seems like a version mismatch, it is not.

@liam-murphy14
Copy link
Contributor Author

liam-murphy14 commented Jun 22, 2024

Hey @ToasterUwU, I just checked on my x86-64 darwin machine, and everything appears to be in working order. I made a Github gist of the steps i took to test it, check it out and let me know if you are still running into issues. In a little while I'll dig out my old x86-64 linux machine and make sure it works on that as well, updating the gist if necessary.

Thanks for the help on testing and bug fixing !!!

Also note: I am force pushing to this branch to rebase on nixpkgs/master, just to make sure I am using the latest chromium/chromedriver pair

EDIT: I just thought of this but I think I know why you might be having issues. If you don't pass in the chromedriver executable path, it will try to download and patch a new one, which will give you issues on NixOS. The other thing is it will try to patch in place the chromedriver path that you pass it if it has not already been patched. This will also cause an issue on NixOS since packages are read-only after build time, unless you use the chromedriver executable that this undetected-chromedriver nix package provides, because it is patched at build time. Let me know if this helps, I just remembered I encountered this issue when i was originally developing this

@liam-murphy14 liam-murphy14 force-pushed the add-python-undetected-chromedriver branch from 1a2037c to 622901b Compare June 22, 2024 19:45
@ToasterUwU
Copy link
Member

@liam-murphy14 Well that explains it, i dont specify the path of the chromedriver, i just do "uc.Chrome()" and it fails.

And honestly, i dont think forcing everyone using this to hardcode a path in one way or another is a solution.
This would mean that you have to edit lots and lots, if not all pieces of software using undetected-chromedriver to be compatible with this package. And the prospect of editing every single piece of software to be able to use it is just not a solution.

I dont really know what the best solution for this is, but one that i can think of is patching the package to use the included chromedriver if nothing is specified by the user instead. Extend the automatic search so to say.

@liam-murphy14
Copy link
Contributor Author

@ToasterUwU Yea I agree, I definitely don't love the hardcoded path either..... I was trying to figure out a way to make the package work without patching the Python code in the package itself, since that adds some maintenance to keep it "in-sync" with the upstream. However, I think you're right in that patching the Python package would be better than forcing a certain way of using it.

Let me work on it over the next couple days and see if I can edit the way it searches for the package, I know it already does something along the lines of searching in $PATH, but not sure how well that would work with a Nix shell. I like the idea of defaulting to the package-installed chromedriver binary if nothing is given.

Thanks for the feedback, will work on that and hopefully have a better version of this package out in some time !

@ToasterUwU
Copy link
Member

ToasterUwU commented Jun 23, 2024

@liam-murphy14 Btw, if you feel like being the only maintainer of this when its done is a but much for you, and you want there to be a backup person that takes care of it in case you are very busy, feel free to add me to the maintainer list of the package. Im already in maintainers/maintainer-list.nix

@liam-murphy14 liam-murphy14 force-pushed the add-python-undetected-chromedriver branch from 622901b to 4481987 Compare June 23, 2024 16:08
@liam-murphy14
Copy link
Contributor Author

Hey @ToasterUwU, I just pushed an update that should fix the path issue, now uc.Chrome() will default to using the pre-patched and installed binary rather than trying to install a new one, but if the user passes in a driver path it will still use that one. That should be a better solution to the problem. I also did add you as a second maintainer, I do plan to take care of this primarily so I don't anticipate it being any work for you at all, but just in case. Thanks for offering, I appreciate it.

Hopefully the package is working now ! I tested again on my darwin machine with and without passing the path, and looks to be working well. Will test it out on my linux machine today as well.

Thanks again for the feedback and help on making this package better !

@ofborg ofborg bot requested a review from ToasterUwU June 23, 2024 16:42
Copy link
Member

@ToasterUwU ToasterUwU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this so quickly!

Glad it works on Darwin already, but sadly it doesnt seem to work on Linux yet.
It knows which chromedriver to use, so that part works, but there is a permission problem. If i run python3 without sudo i get a permission issue, if i run it with sudo, the package seems inaccessible, python cant find it.
The second part might be a quirk of the nixpkgs-review shell, but the first part i dont think.

Here the error message for some context:

❯ python3
Python 3.11.9 (main, Apr  2 2024, 08:25:04) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import undetected_chromedriver as  uc
>>> uc.Chrome()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nix/store/2fqpwa5cnyqxcj4znhc14g1nhi0z3bjk-python3.11-undetected-chromedriver-3.5.5/lib/python3.11/site-packages/undetected_chromedriver/__init__.py", line 258, in __init__
    self.patcher.auto()
  File "/nix/store/2fqpwa5cnyqxcj4znhc14g1nhi0z3bjk-python3.11-undetected-chromedriver-3.5.5/lib/python3.11/site-packages/undetected_chromedriver/patcher.py", line 150, in auto
    return self.patch_exe()
           ^^^^^^^^^^^^^^^^
  File "/nix/store/2fqpwa5cnyqxcj4znhc14g1nhi0z3bjk-python3.11-undetected-chromedriver-3.5.5/lib/python3.11/site-packages/undetected_chromedriver/patcher.py", line 347, in patch_exe
    with io.open(self.executable_path, "r+b") as fh:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: '/nix/store/2fqpwa5cnyqxcj4znhc14g1nhi0z3bjk-python3.11-undetected-chromedriver-3.5.5/bin/chromedriver'

@liam-murphy14
Copy link
Contributor Author

Hey @ToasterUwU, I just tried it on my Linux laptop and getting the same error :( I will take a look either later today or tomorrow and try to fix it.

It appears that on Linux, uc is not recognizing the pre patched binary as patched, and is trying to patch it again. That is why we are seeing the permissions issue, since it's read only after build time. I will do some debugging and see if the patch call during build time is failing or something like that.

Thanks for taking a look !

@ToasterUwU
Copy link
Member

@liam-murphy14 Maybe it patches the binary differently on Linux compared to Darwin? And because of that it looks for the Linux patches and cant find them, so it assumes its unpatched? Just spitballing here, but at least in theory this might be an explanation.

If that was the case, the only solution i can think of is to prepatch two binaries and hand out the correct one for the system in question. I cant test it sadly, since im not an Apple User at all, only thing i have from them is some ancient earpods in a drawer somewhere.

@ToasterUwU
Copy link
Member

@Titaniumtown You seem interested in this as well, dont know how actively you are following this issue, but maybe you wanna throw your hat in the ring? Cant hurt to have more brains and eyes on this.

@Titaniumtown
Copy link
Contributor

Unfortunately I do not have the time right now to test, if/when I get to it, I'll post my results here.

@liam-murphy14
Copy link
Contributor Author

Hey @ToasterUwU, yea honestly not sure. Have been doing a bit of testing today, I have confirmed that the chromedriver binary is the correct Linux one when run on a Linux machine, also going to check and make sure the operating system checks in the undetected-chromedriver python package itself are correct. Will update when I do. It appears that the patching logic shouldn't be different based on OS, but who knows. Either way will keep trying to debug.

A sidebar: what is the easiest way to debug something like this ?? So far I have just been using nix develop with a test flake that takes nixpkgs input from my local repo, and just builds a dev shell with python3.withPackages ... undetected-chromedriver, but if you know a better way, please do let me know. The way I have been doing it doesn't have the same effect as actually running nix develop ./path-to-flake#undetected-chromedriver, but I can't get that command working because it's a Python package, rather than a top level package.

Thanks for your help !

@ToasterUwU
Copy link
Member

@liam-murphy14 I always use nixpkgs-review.

  1. Install nixpkgs-review
  2. Open a terminal and CD into your nixpkgs local copy
  3. Run nixpkgs-review pr pr-number-here
  4. It will automatically enter a review shell where everything that pr does is available
  5. Run a python script in that shell and it will be able to import undetected-chtomedriver

That way is the easiest I can think of

@liam-murphy14
Copy link
Contributor Author

Hey @ToasterUwU, I am beginning to think that maybe there is an issue with the undetected-chromedriver library. I have updated this gist with more testing details, but will summarize here.

Basically, the library searches for a specific point in the chromedriver binary, injecting the patched code there. For the chromedriver version in nixpkgs, currently this check is failing. That is, the library is not able to find where to inject the custom code, so it is doing nothing. That is why the binary is not getting patched. I will confirm on Darwin that this is the difference.

Testing evidence from gist:

update: trying to debug the build failures on linux, it appears to be an actual issue with the library itself. In order to test this, first run nix shell nixpkgs#chromedriver to get the normal, most up to date chromedriver in nixpkgs. Then, use the file driver_test.py below, and run python3 driver_test.py $(which chromedriver) from in the nix shell. In my own tests, I get

➜  python3 driver_test.py $(which chromedriver)
match_injected: None
if check fail

This test is what the patcher.py of undetected-chromedriver uses to determine where to patch the binary. If this match_injected is None, then it does nothing. This is why it is failing on Linux. I will try this out on Darwin to confirm, but i think it is the linux binary itself.

If this is the case, I am not sure what else to do, unfortunately. I will update my nixpkgs to latest on Linux and Darwin, and check again, but if this check is failing on linux nixpkgs chromedriver, then I don't think there is anything we can do.

@ToasterUwU
Copy link
Member

@liam-murphy14 inreresting and also weird. Have you checked if there is an issue on the upstream repo?
If this is the case (can't test it myself, too busy atm) than the only thing we can do is to report it and hope they fix it soon.

When I get the chance I will spin up a Ubuntu vm and try the library there.

@liam-murphy14
Copy link
Contributor Author

liam-murphy14 commented Jun 29, 2024

Still failing after updating nixpkgs on Nixos linux. I haven't checked upstream, will check that plus the Darwin thing later today. Thanks !

@liam-murphy14
Copy link
Contributor Author

Hey @ToasterUwU, some more results: on darwin the nixpkgs version works fine still, and I checked, it is that line causing the issue, the check is passing on darwin but failing on linux. More curious tho: when I downloaded the linux64 binary from https://googlechromelabs.github.io/chrome-for-testing/#stable, the check passed !! So i think something from the nix derivation of chromedriver for nixpkgs is causing the issue.

I suspect that it's the highlighted lines here: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/development/tools/selenium/chromedriver/default.nix#L55-L56, my guess is that patching the libraries is shifting some bits around, and undetected-chromedriver is not able to find the right place to inject its code.

So it isn't an upstream problem. I could maybe try to rewrite this so that it downloads the binary directly, patches it with UC, then wraps it in the libraries ??? That seems kinds of hackish but not sure what else to do.

Let me know if you have any other ideas. Thank you !

@ToasterUwU
Copy link
Member

@liam-murphy14 yeah it seems very likely that this is the cause of the problem, if the bytes aren't in the "correct" location it probably doesn't wanna touch it.

And I don't really see any alternative in that case sadly. Have to work around the nixpkgs chromedriver somehow.

@liam-murphy14 liam-murphy14 force-pushed the add-python-undetected-chromedriver branch from 4481987 to 6e62dce Compare July 7, 2024 17:52
@ofborg ofborg bot requested a review from ToasterUwU July 7, 2024 20:19
@ToasterUwU
Copy link
Member

Result of nixpkgs-review pr 316536 run on x86_64-linux 1

2 packages failed to build:
  • python311Packages.undetected-chromedriver
  • python311Packages.undetected-chromedriver.dist

@ToasterUwU
Copy link
Member

Seems that its distutils again.. somehow..

error: builder for '/nix/store/np9hzb5pi17gwrs4h1wjxbfqq7y23v5y-python3.11-undetected-chromedriver-3.5.5.drv' failed with exit code 1;
       last 10 log lines:
       > evaling implicit 'postFixup' string hook
       > Traceback (most recent call last):
       >   File "/nix/store/rb23lirkmkk95ydfjbwhf1hgz0sdyy66-patchChromedriver.py", line 4, in <module>
       >     import undetected_chromedriver as uc
       >   File "/nix/store/pqvbsahwxic6h1wyysjvhd354lgv94s4-python3.11-undetected-chromedriver-3.5.5/lib/python3.11/site-packages/undetected_chromedriver/__init__.py", line 44, in <module>
       >     from .patcher import IS_POSIX
       >   File "/nix/store/pqvbsahwxic6h1wyysjvhd354lgv94s4-python3.11-undetected-chromedriver-3.5.5/lib/python3.11/site-packages/undetected_chromedriver/patcher.py", line 4, in <module>
       >     from distutils.version import LooseVersion
       > ModuleNotFoundError: No module named 'distutils'
       > /nix/store/5r0df66ikad3xw06azlqvswcvncll8wa-stdenv-linux/setup: line 193: pop_var_context: head of shell_variables not a function context
       For full logs, run 'nix log /nix/store/np9hzb5pi17gwrs4h1wjxbfqq7y23v5y-python3.11-undetected-chromedriver-3.5.5.drv'.
error: 1 dependencies of derivation '/nix/store/a91ad8yx6xp11v9qrizfpk50k1inv7qa-review-shell.drv' failed to build

@liam-murphy14
Copy link
Contributor Author

Hey @ToasterUwU, thanks for checking on that, yea I noticed it too after I nix flake update-ed on a project that was using this..... not sure how the distutils thing is affecting Python311 since I thought that removing the library was new in 312. Well anyways that kind of bonks this library until upstream removes dependency on distutils.

On another note, there is a new library by the creator of undetected-chromedriver, called nodriver which is a UC alternative that does not rely on chromedriver binary. I will take a look this weekend, if no nix derivation already exists, I will write a quick one to support that library. That way there is at least some alternative that works. I think the library pretty fresh tho, so not sure how fully featured it will be.

Thanks for your patience with the long process of getting this library into nixpkgs. Appreciate your help !!

@ToasterUwU
Copy link
Member

"nodriver is the official successor"

Well that is good to know. I will also take a look today

@ToasterUwU
Copy link
Member

Ok, i just tested nodriver and.. OMG

This thing flies away with how fast it can go. UC is a joke compared to this performance-wise.
Not sure if it fits every use case that exists for UC, there probably are some things that wont work easily or at all, but at least all the things i ever tried with the original UC work fine apparently.

And it also seems to work very well. I tested it in an FHS through vscode-fhs and without FHS by just using Konsole, both worked flawlessly.
So yeah, i would be more than happy to have this as a solution. UC is nice and all, and there are some things out there that use it and that i cant easily switch over, but i wanted UC mostly for things i make for myself and by myself, so im more than happy to switch to a faster and easier and async compatible alternative.
For everything i didnt make i can either rewrite some parts to work with nodriver, or alternatively i can still just run it in a VM or Docker container or something like that.

How are doing for freetime that you want to spend on nodriver? If you are low on time, im happy to start and add you as maintainer alongside me. I have never made a nix package before, but nodriver seems like any other python package from how it works and what it needs, so i think this shouldnt be too hard.
In case you would like me to start working on it, do you know what the stance is on setting it up in a way that it gets chromium as a dependency? Without chrome, or chromium, or something like it that package would have no use at all, but i dont think thats within the scope of a python package on nix. So unless you are like "great idea, i dont see why not" i will probably not do that.

@liam-murphy14
Copy link
Contributor Author

Hey @ToasterUwU, thanks for checking it out !! I actually have some free time for nodriver today, so let me see if I can get a quick version out, like you said it should not be too difficult, will respond later tonight with the PR.

To answer your question, yea I think it's definitely not in scope for the python library. Just as nodriver and undetected-chromedriver don't try to install the browser executable itself, I don't think the nix package should either. IMO, it's up to the user to make sure they have a chromium-based browser executable on their $PATH, just like with the non-nix python libraries.

@liam-murphy14
Copy link
Contributor Author

liam-murphy14 commented Jul 11, 2024

Just opened a PR for nodriver: #326242. Added @ToasterUwU as a maintainer as well, let me know if you want me to take you off the maintainer list. Was able to test on darwin and everything looking great. Will test on Linux tomorrow. Let me know if you have any feedback on it !! Thank you !

@ToasterUwU
Copy link
Member

@liam-murphy14 Thanks, im happy to be a maintainer, after all i will be using that package a lot.

Since the successor of this package is in the works as a nix package, does it make sense to close this PR now? Or do you still want to add it whenever the distutils issue is fixed?

@liam-murphy14
Copy link
Contributor Author

@ToasterUwU awesome, happy to hear that !

Also I think I will leave this open and still add it whenever distutils is fixed. Currently nodriver is marked as alpha, so will likely be very unstable at least for a while. Would be nice to also have this package available until nodriver reaches feature parity.

@liam-murphy14
Copy link
Contributor Author

Hey @ToasterUwU, @Titaniumtown, I am closing this PR. It looks like somebody recently merged a PR for undetected-chromedriver, see this derivation, and the corresponding nix python module. Thanks for the support on this. I have not gotten a chance to test the derivation, but I am hopeful that it will work ! I will keep my PR for nodriver open as well, so hopefully that will get merged. Thanks again !

@liam-murphy14 liam-murphy14 deleted the add-python-undetected-chromedriver branch July 21, 2024 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants