Resolution of various bugs and general maintenance of the project #216

Eitol · 2024-05-11T19:42:09Z

Changes:

Implement retry mechanism for rate limited requests

The request module in the google_play_scraper has been updated to handle rate limited requests. When these occur, the module will now retry the request up to a maximum of 3 times with delay increments, and raise the last exception if unsuccessful. An exception has also been added to address 'com.google.play.gateway.proto.PlayGatewayError' issues.
Solve: [BUG] Function "reviews_all" returns diferent amount of reviews at each script execution #208 [BUG] reviews_all doesn't download all reviews of an app with large amount of reviews #209 Unable to download all reviews from an app[BUG] #147
Handle empty results in review scraper

This change modifies the review scraper to gracefully handle situations where the received results are empty. Previously, an attempt to access data from the empty result would throw an exception, causing the scraper to crash. Now, the scraper checks if the result is empty and, if so, returns an empty list along with the token.
Add exception handling for token extraction in reviews.py

When the continuation token did not come (because there were no more pages), no response was returned even if it did come.
Here, an exception is caught if there's an error during the extraction of token from the match object. If an exception occurs, None is assigned to the token variable.
Increase review fetch limit in Google Play Scraper

The maximum count for each fetch in the Google Play Scraper has been increased from 199 to 4500. This change will allow more reviews to be fetched in a single request for better efficiency and data collection.
Add SSL context modification to bypass verification

An adjustment was made to the SSL context creation in the google_play_scraper/utils/request.py file to bypass SSL verification. This was done by creating a default HTTPS context that does not perform SSL certificate verification.
Update regex patterns in scraper constants

Improved the definition of regex patterns in the Google Play scraper by adding raw string notations. This enhancement prevents potential issues with special character interpretations. Moreover, NOT_NUMBER pattern has also been refined for better performance.
Update e2e tests for search, app, permissions, and reviews

Updated several end-to-end tests to reflect changes in expected outputs. This includes modifications in test_search.py, test_app.py, test_permissions.py, test_reviews.py and test_reviews_all.py. These alterations include changes in expected urls, search keywords, expected category names, and tested apps among others. The changes ensure that the tests are up to date with current data and expectations.

If a content is extracted as null, instead of assigning it directly, we now use a fallback value

Raw strings are now used to define the patterns which increases readability and avoids the backslash escaping issue.

A change has been made in the request.py file of the Google Play Scraper utility to bypass SSL verification. This has been done by updating the default HTTPS context object in the SSL module with an unverified context, which allows HTTP requests to ignore SSL certificate verification.

In the utility "request" of the google play scraper, we have implemented a retry logic whenever a 'com.google.play.gateway.proto.PlayGatewayError' (rate limit exceeded error) is encountered. The function will now retry up to a maximum of 3 times with an increasing delay time between each retry in order to respect the server's rate limit.

The MAX_COUNT_EACH_FETCH constant in the reviews.py module has been updated from 199 to 4500. This change will allow the scraper to fetch a larger number of reviews in each request.

When the continuation token did not come (because there were no more pages), no response was returned even if it did come. Added a try/except block to handle the exception that may arise when extracting the token in the reviews.py file. This change prevents the code from breaking when a token is not found.

The function now checks if the results are empty before returning them, ensuring improved error management.

Several modifications have been made in the test modules including test_reviews_all.py, test_search.py, test_app.py, test_permissions.py, and test_reviews.py. Changes majorly include updating the references in the mocks and updating assertions for testing the updated functionalities. Changes also ensure the tests align with the latest changes in the application behavior.

The code was refactored with "is None" replacing "== None" in element.py to adhere to python best practices. An unused variable was removed in the reviews.py file to improve readability. An extra line was also introduced in request.py for better code structuring and readability.

Kingki19 · 2024-05-21T14:24:23Z

google_play_scraper/features/reviews.py

why MAX_COUNT_EACH_FETCH limited only to 4500?

It is the maximum limit supported by the Playstore API (you can try with a higher number, for example 4501 and it does not work, but with 4500 it does)

JoMingyu

You're my sunshine. I'll test and release this asap, and notify to you.

Eitol added 10 commits May 11, 2024 15:25

Handle null content in google_play_scraper

dd18c63

If a content is extracted as null, instead of assigning it directly, we now use a fallback value

Update regex patterns in scraper constants

bb9e90d

Raw strings are now used to define the patterns which increases readability and avoids the backslash escaping issue.

Imports to solve ssl issue

52a1a81

Update MAX_COUNT_EACH_FETCH in reviews.py

9dca581

The MAX_COUNT_EACH_FETCH constant in the reviews.py module has been updated from 199 to 4500. This change will allow the scraper to fetch a larger number of reviews in each request.

Handle empty results in Google Play reviews scraper

1692379

The function now checks if the results are empty before returning them, ensuring improved error management.

Eitol changed the title ~~Project~~ Resolution of various bugs and general maintenance of the project May 11, 2024

Kingki19 reviewed May 21, 2024

View reviewed changes

JoMingyu approved these changes May 29, 2024

View reviewed changes

JoMingyu merged commit 1960f86 into JoMingyu:master May 29, 2024
0 of 5 checks passed

JoMingyu mentioned this pull request Jun 7, 2024

[BUG] reviews_all doesn't download all reviews of an app with large amount of reviews #209

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolution of various bugs and general maintenance of the project #216

Resolution of various bugs and general maintenance of the project #216

Eitol commented May 11, 2024

Kingki19 May 21, 2024

Eitol May 29, 2024

JoMingyu left a comment

Resolution of various bugs and general maintenance of the project #216

Resolution of various bugs and general maintenance of the project #216

Conversation

Eitol commented May 11, 2024

Kingki19 May 21, 2024

Choose a reason for hiding this comment

Eitol May 29, 2024

Choose a reason for hiding this comment

JoMingyu left a comment

Choose a reason for hiding this comment