-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolution of various bugs and general maintenance of the project #216
Conversation
If a content is extracted as null, instead of assigning it directly, we now use a fallback value
Raw strings are now used to define the patterns which increases readability and avoids the backslash escaping issue.
A change has been made in the request.py file of the Google Play Scraper utility to bypass SSL verification. This has been done by updating the default HTTPS context object in the SSL module with an unverified context, which allows HTTP requests to ignore SSL certificate verification.
In the utility "request" of the google play scraper, we have implemented a retry logic whenever a 'com.google.play.gateway.proto.PlayGatewayError' (rate limit exceeded error) is encountered. The function will now retry up to a maximum of 3 times with an increasing delay time between each retry in order to respect the server's rate limit.
The MAX_COUNT_EACH_FETCH constant in the reviews.py module has been updated from 199 to 4500. This change will allow the scraper to fetch a larger number of reviews in each request.
When the continuation token did not come (because there were no more pages), no response was returned even if it did come. Added a try/except block to handle the exception that may arise when extracting the token in the reviews.py file. This change prevents the code from breaking when a token is not found.
The function now checks if the results are empty before returning them, ensuring improved error management.
Several modifications have been made in the test modules including test_reviews_all.py, test_search.py, test_app.py, test_permissions.py, and test_reviews.py. Changes majorly include updating the references in the mocks and updating assertions for testing the updated functionalities. Changes also ensure the tests align with the latest changes in the application behavior.
The code was refactored with "is None" replacing "== None" in element.py to adhere to python best practices. An unused variable was removed in the reviews.py file to improve readability. An extra line was also introduced in request.py for better code structuring and readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why MAX_COUNT_EACH_FETCH
limited only to 4500?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the maximum limit supported by the Playstore API (you can try with a higher number, for example 4501 and it does not work, but with 4500 it does)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're my sunshine. I'll test and release this asap, and notify to you.
Changes:
Implement retry mechanism for rate limited requests
The request module in the google_play_scraper has been updated to handle rate limited requests. When these occur, the module will now retry the request up to a maximum of 3 times with delay increments, and raise the last exception if unsuccessful. An exception has also been added to address 'com.google.play.gateway.proto.PlayGatewayError' issues.
Solve: [BUG] Function "reviews_all" returns diferent amount of reviews at each script execution #208 [BUG] reviews_all doesn't download all reviews of an app with large amount of reviews #209 Unable to download all reviews from an app[BUG] #147
Handle empty results in review scraper
This change modifies the review scraper to gracefully handle situations where the received results are empty. Previously, an attempt to access data from the empty result would throw an exception, causing the scraper to crash. Now, the scraper checks if the result is empty and, if so, returns an empty list along with the token.
Add exception handling for token extraction in reviews.py
When the continuation token did not come (because there were no more pages), no response was returned even if it did come.
Here, an exception is caught if there's an error during the extraction of token from the match object. If an exception occurs, None is assigned to the token variable.
Increase review fetch limit in Google Play Scraper
The maximum count for each fetch in the Google Play Scraper has been increased from 199 to 4500. This change will allow more reviews to be fetched in a single request for better efficiency and data collection.
Add SSL context modification to bypass verification
An adjustment was made to the SSL context creation in the google_play_scraper/utils/request.py file to bypass SSL verification. This was done by creating a default HTTPS context that does not perform SSL certificate verification.
Update regex patterns in scraper constants
Improved the definition of regex patterns in the Google Play scraper by adding raw string notations. This enhancement prevents potential issues with special character interpretations. Moreover, NOT_NUMBER pattern has also been refined for better performance.
Update e2e tests for search, app, permissions, and reviews
Updated several end-to-end tests to reflect changes in expected outputs. This includes modifications in test_search.py, test_app.py, test_permissions.py, test_reviews.py and test_reviews_all.py. These alterations include changes in expected urls, search keywords, expected category names, and tested apps among others. The changes ensure that the tests are up to date with current data and expectations.