-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Immoscout24 Captcha Resolution #630
base: main
Are you sure you want to change the base?
Conversation
Hi @DerLeole, First of all, thank you so much for taking the time to implement this. It's a real gift to have an active community on a project like flathunter, and it saves me a lot of stress and headache when people step up and make contributions. I'll add some feedback for the review - I'll try and be clear about what I consider mandatory for merging and what's just optional. But ultimately if you don't want to implement the feedback you can also just say and I will happily tidy this up for you and get it merged (and will try and preserve your commits so that you also get the attribution). I signed up at Capmonster and did a test locally and the code works, so that's amazing for a first contribution. Re. leaking keys, you can rebase your commits ( I'll will test later if it works for the flathunter cloud deployment. The 2captcha implementation is anyway useless at this point, so your implementation can certainly not be worse :) Thanks again for the contribution, Arthur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for this. You've done an amazing job of following the style and layout of the code that's there - excellent work!
|
||
# Intercept background network traffic via log sniffing | ||
sleep(2) | ||
logs_raw = driver.get_log("performance") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some deep magic right here. Well done for finding you're way around this.
iv = response_json["state"]["iv"] | ||
context = response_json["state"]["payload"] | ||
sitekey = response_json["key"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick - not sure why we need to double blank line here. One would be plenty
"""Resolve AWS WAF Captcha""" | ||
|
||
# Intercept background network traffic via log sniffing | ||
sleep(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it would be nice to avoid arbitrary 'sleep's in the code, but I appreciate that we're doing weird network magic here with an uncooperative third-party system, so for the sake of having things work I'm happy to leave this in.
patternChallenge = r'src="([^"]*challenge\.js)"' | ||
challenge_matches = re.findall(patternChallenge, driver.page_source) | ||
for match in challenge_matches: | ||
print(f'Challenge SRC Value: {match}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace print
with logger.debug
where it appears in this file.
patternJsApi = r'src="([^"]*jsapi\.js)"' | ||
jsapi_matches = re.findall(patternJsApi, driver.page_source) | ||
for match in jsapi_matches: | ||
print(f'JsApi SRC Value: {match}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please place print
with logger.debug
. Also you'll probably find for calls to the logger
that the linter will a %s
and a second argument to an f-string (because then it doesn't have to do the string interpolation if the log call isn't triggered).
@@ -66,6 +67,7 @@ def __retrieve_2captcha_result(self, captcha_id: str): | |||
"key": self.api_key, | |||
"action": "get", | |||
"id": captcha_id, | |||
"json": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
@@ -87,4 +89,4 @@ def __retrieve_2captcha_result(self, captcha_id: str): | |||
if not retrieve_response.text.startswith("OK"): | |||
raise requests.HTTPError(response=retrieve_response) | |||
|
|||
return retrieve_response.text.split("|", 1)[1] | |||
return retrieve_response.text.split("|", 1)[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like an unnecessary whitespace change - please revert this.
@@ -59,13 +59,12 @@ def get_chrome_driver(driver_arguments): | |||
"""Configure Chrome WebDriver""" | |||
logger.info('Initializing Chrome WebDriver for crawler...') | |||
chrome_options = uc.ChromeOptions() # pylint: disable=no-member | |||
if platform == "darwin": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you deliberately remove this? What happens if you add it back? (I'm testing on Linux, so I don't use this code path).
@@ -36,6 +37,7 @@ class Env: | |||
# Captcha setup | |||
FLATHUNTER_2CAPTCHA_KEY = _read_env("FLATHUNTER_2CAPTCHA_KEY") | |||
FLATHUNTER_IMAGETYPERZ_TOKEN = _read_env("FLATHUNTER_IMAGETYPERZ_TOKEN") | |||
FLATHUNTER_CAPMONSTER_KEY = _read_env("FLATHUNTER_CAPMONSTER_KEY") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for wiring up the environment config!
@@ -124,6 +124,7 @@ def get_entries_from_javascript(self): | |||
logger.error( | |||
"IS24 bot detection has identified our script as a bot - we've been blocked" | |||
) | |||
logger.info(self.get_driver_force().page_source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this info
? Or is it enough if this is debug
?
Also, the Linter has a bunch of feedback. There are some |
I added the my api key after adding funds but I still get
I'm not sure if it's something I'm missing or did immoscout change stuff around their end again |
Solves #577 #589 #513
Immoscout seems to have entirely moved from GeeTest to AWS WAF Captcha.
This PR implments a new solver Capmonster to deal with that fact.
My reasoning behind that:
After trying for hours to get any of the existing captcha solvers to work with the new clientside AWS WAF javascript captchas, I caved and implemented @jukoson's fix using capmonster as a solver in a way that mimics the other implementations.
It should be entirely backwards compatible
To use the new solver just modify your ENV variables or config:
Shortcomings:
Hope this helps!