Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix docip bug & add support for python 3.11 #205

Merged
merged 2 commits into from
Dec 26, 2023

Conversation

B1ACK917
Copy link
Contributor

@B1ACK917 B1ACK917 commented Dec 21, 2023

1. Fix: docip crawler bug

As mentioned in issue #204 , the bug in docip crawler will block the getter thread and thus cannot get proxy normally.
This PR is to fix the host:port parse logic in proxypool/crawlers/public/docip.py.

2. Add: support for python3.11
2.1 update requirements.txt
After python3.11, there is a variable name change for CFrame, which will cause the building failure when installing old version of gevent.
e.g.

In file included from src/gevent/_greenlet_primitives.c:1207:
      /tmp/pip-install-cox1kni6/gevent_420af8e5de034ff09b7c37217c08be8a/deps/greenlet/greenlet.h:42:5: error: unknown type name 'CFrame'
         42 |     CFrame* cframe;
            |     ^
      1 error generated.

Check this commit in CPython: [bpo-45431]Rename CFrame to _PyCFrame in the C API
They rename it from CFrame *cframe; to _PyCFrame *cframe; and this results in a large number of packages needing to be modified to adapt to CPython updates.
As Python 3.11 has been available for over a year, and Python 3.12 has been available for over two months, I think it's time to bump the requirements to support newer version of Python.

2.2 update proxypool/processors/tester.py
After python3.11, directly passing coroutines to asyncio loop is forbidden, which will cause TypeError on proxypool/processors/tester.py:85

 tasks = [self.test(proxy) for proxy in proxies]

as

  File "/usr/lib/python3.11/asyncio/tasks.py", line 415, in wait
    raise TypeError("Passing coroutines is forbidden, use tasks explicitly.")

TypeError: Passing coroutines is forbidden, use tasks explicitly.

I changed it to newer style for supporting newer Python version.
This is also mentioned in issue #201.

I tested these changes under Python 3.11 & Python 3.12 environment and verified they are running steadily.

Regards,
B1ACK917

@B1ACK917 B1ACK917 changed the title fix: docip bug fix docip bug & add support for python 3.11 Dec 21, 2023
@B1ACK917
Copy link
Contributor Author

BTW, I found that in Python3.12, the code on proxypool/crawlers/__init__.py: 9

module = loader.find_module(name).load_module(name)

will cause an error.
This is because the find_module function is deprecated and it should be replaced with find_spec.
A valid fix is change it to

module = loader.find_spec(name).loader.load_module()

But because Python3.12 is not so widely supported yet, so I left it out of this PR.
I think you may want to fix it by yourself when you think it's time to add support for Python3.12, or I can open another PR sometime later.

Regards,
B1ACK917

@Germey Germey merged commit 97229e6 into Python3WebSpider:master Dec 26, 2023
@Germey
Copy link
Member

Germey commented Dec 26, 2023

Thanks for your contrinution, I will follow up more details on 3.12 support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants