Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolution of various bugs and general maintenance of the project #216

Merged
merged 10 commits into from
May 29, 2024

Conversation

Eitol
Copy link
Contributor

@Eitol Eitol commented May 11, 2024

Changes:

  • Implement retry mechanism for rate limited requests

    The request module in the google_play_scraper has been updated to handle rate limited requests. When these occur, the module will now retry the request up to a maximum of 3 times with delay increments, and raise the last exception if unsuccessful. An exception has also been added to address 'com.google.play.gateway.proto.PlayGatewayError' issues.
    Solve: [BUG] Function "reviews_all" returns diferent amount of reviews at each script execution #208 [BUG] reviews_all doesn't download all reviews of an app with large amount of reviews #209 Unable to download all reviews from an app[BUG]  #147

  • Handle empty results in review scraper

    This change modifies the review scraper to gracefully handle situations where the received results are empty. Previously, an attempt to access data from the empty result would throw an exception, causing the scraper to crash. Now, the scraper checks if the result is empty and, if so, returns an empty list along with the token.

  • Add exception handling for token extraction in reviews.py

    When the continuation token did not come (because there were no more pages), no response was returned even if it did come.
    Here, an exception is caught if there's an error during the extraction of token from the match object. If an exception occurs, None is assigned to the token variable.

  • Increase review fetch limit in Google Play Scraper

    The maximum count for each fetch in the Google Play Scraper has been increased from 199 to 4500. This change will allow more reviews to be fetched in a single request for better efficiency and data collection.

  • Add SSL context modification to bypass verification

    An adjustment was made to the SSL context creation in the google_play_scraper/utils/request.py file to bypass SSL verification. This was done by creating a default HTTPS context that does not perform SSL certificate verification.

  • Update regex patterns in scraper constants

    Improved the definition of regex patterns in the Google Play scraper by adding raw string notations. This enhancement prevents potential issues with special character interpretations. Moreover, NOT_NUMBER pattern has also been refined for better performance.

  • Update e2e tests for search, app, permissions, and reviews

    Updated several end-to-end tests to reflect changes in expected outputs. This includes modifications in test_search.py, test_app.py, test_permissions.py, test_reviews.py and test_reviews_all.py. These alterations include changes in expected urls, search keywords, expected category names, and tested apps among others. The changes ensure that the tests are up to date with current data and expectations.

image

Eitol added 10 commits May 11, 2024 15:25
If a content is extracted as null, instead of assigning it directly, we now use a fallback value
Raw strings are now used to define the patterns which increases readability and avoids the backslash escaping issue.
A change has been made in the request.py file of the Google Play Scraper utility to bypass SSL verification. This has been done by updating the default HTTPS context object in the SSL module with an unverified context, which allows HTTP requests to ignore SSL certificate verification.
In the utility "request" of the google play scraper, we have implemented a retry logic whenever a 'com.google.play.gateway.proto.PlayGatewayError' (rate limit exceeded error) is encountered. The function will now retry up to a maximum of 3 times with an increasing delay time between each retry in order to respect the server's rate limit.
The MAX_COUNT_EACH_FETCH constant in the reviews.py module has been updated from 199 to 4500. This change will allow the scraper to fetch a larger number of reviews in each request.
When the continuation token did not come (because there were no more pages), no response was returned even if it did come.

Added a try/except block to handle the exception that may arise when extracting the token in the reviews.py file. This change prevents the code from breaking when a token is not found.
The function now checks if the results are empty before returning them, ensuring improved error management.
Several modifications have been made in the test modules including test_reviews_all.py, test_search.py, test_app.py, test_permissions.py, and test_reviews.py. Changes majorly include updating the references in the mocks and updating assertions for testing the updated functionalities. Changes also ensure the tests align with the latest changes in the application behavior.
The code was refactored with "is None" replacing "== None" in element.py to adhere to python best practices. An unused variable was removed in the reviews.py file to improve readability. An extra line was also introduced in request.py for better code structuring and readability.
@Eitol Eitol changed the title Project Resolution of various bugs and general maintenance of the project May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why MAX_COUNT_EACH_FETCH limited only to 4500?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the maximum limit supported by the Playstore API (you can try with a higher number, for example 4501 and it does not work, but with 4500 it does)

Copy link
Owner

@JoMingyu JoMingyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're my sunshine. I'll test and release this asap, and notify to you.

@JoMingyu JoMingyu merged commit 1960f86 into JoMingyu:master May 29, 2024
0 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants