-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenWPM StorageWatchdog complete #1039
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
thank you for this PR. I see that this is a useful feature especially when running natively on machines that are not under your control.
And while this looks good on first pass, I'd need to spend more time on a thorough review which I don't have atm.
I'll try to get back to you as soon as my schedule allows
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1039 +/- ##
==========================================
- Coverage 46.20% 40.77% -5.43%
==========================================
Files 34 35 +1
Lines 3398 3480 +82
==========================================
- Hits 1570 1419 -151
- Misses 1828 2061 +233
☔ View full report in Codecov by Sentry. |
…n for increased compatibility
If there are any other changes requested, feel free to comment! I really enjoyed working on this project and would love to help continue to make it better! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up on this and sorry for letting it linger.
…StorageWatchdog backend.
…StorageWatchdog backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See current implementation but marking this as resolved.
Introducing the StorageWatchdog module for OpenWPM, a powerful tool designed to enhance your web scraping experience. This module offers a range of features that enable you to efficiently manage temporary files, monitor their size, and effectively handle browser profiles, all without compromising performance.
With the StorageWatchdog module, you gain the ability to easily redirect temporary files to a directory of your choice. This functionality ensures that any temporary files generated during your web scraping activities are conveniently stored in a location that suits your needs. Whether you prefer a specific directory for organization or have limited storage space concerns, this feature allows you to maintain control over the storage location of these files effortlessly.
In addition to redirecting the temporary files, the StorageWatchdog module offers a monitoring system for their size. This capability enables you to keep track of the growth and consumption of disk space by the temporary files and browser profiles. The watchdog performs checks at 5 minute intervals and checks the size of the current browser profile to determine whether or not that specific browser needs to be reset to clear space.
This addition was created as a result of my capstone team running into issues with longer crawls running out of storage on our smaller infrastructure. Because we only needed the data from the generated database (more than most people need in the first place), these other random artifacts were simply taking up space on our system.
The usage is fairly simple and a demo_watchdog.py has been supplied to demonstrate its functionality and usage. Simply enable the watchdog as you would any other watchdog, configure the size in bytes for the check to monitor (this is located in the periodic_check function in storage_watchdog.py) and optionally specify a tmp file location (this defaults to /tmp as it has before) using the new Browser Paramater, tmp_profile_dir.
I hope you accept these changes and agree that this simple addition could be helpful to others!