Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenWPM hangs #957

Closed
MaazBinMusa opened this issue Sep 29, 2021 · 2 comments · Fixed by #1057
Closed

OpenWPM hangs #957

MaazBinMusa opened this issue Sep 29, 2021 · 2 comments · Fixed by #1057
Assignees

Comments

@MaazBinMusa
Copy link
Contributor

MaazBinMusa commented Sep 29, 2021

In the deploy firefox file I install an add_on adblocker towards the end, after openwpm xpi is installed.

After this I run a simple crawl of 50 websites. This results in the crawl getting executed but gives these warning (about 50 of these) and the demo.py does not shutdown. It Just hangs.

storage_controller   - INFO     - Awaiting all tasks for visit_id 2394433868866261
sql_provider         - WARNING  - Visit with visit_id 2394433868866261 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 8857879637229978
sql_provider         - WARNING  - Visit with visit_id 8857879637229978 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 6605805886605677
sql_provider         - WARNING  - Visit with visit_id 6605805886605677 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 4976310008382932
sql_provider         - WARNING  - Visit with visit_id 4976310008382932 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 1221128306404878
sql_provider         - WARNING  - Visit with visit_id 1221128306404878 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 1903248132902091
sql_provider         - WARNING  - Visit with visit_id 1903248132902091 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 1374176394863563
sql_provider         - WARNING  - Visit with visit_id 1374176394863563 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 2714861535115475
sql_provider         - WARNING  - Visit with visit_id 2714861535115475 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 5099912865844243
sql_provider         - WARNING  - Visit with visit_id 5099912865844243 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 4791266809356625
sql_provider         - WARNING  - Visit with visit_id 4791266809356625 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 7916683500461527
sql_provider         - WARNING  - Visit with visit_id 7916683500461527 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 7374938213921325
sql_provider         - WARNING  - Visit with visit_id 7374938213921325 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 5036639023635015
sql_provider         - WARNING  - Visit with visit_id 5036639023635015 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 3010862000196919
sql_provider         - WARNING  - Visit with visit_id 3010862000196919 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 2244968277718576
sql_provider         - WARNING  - Visit with visit_id 2244968277718576 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 319991707537304
sql_provider         - WARNING  - Visit with visit_id 319991707537304 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 3455507760516858
sql_provider         - WARNING  - Visit with visit_id 3455507760516858 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 4595049379283658
sql_provider         - WARNING  - Visit with visit_id 4595049379283658 got interrupted
storage_controller   - INFO     - Awaiting all tasks for visit_id 7847554982819993
sql_provider         - WARNING  - Visit with visit_id 7847554982819993 got interrupted
Executing <Task pending name='Task-1' coro=<StorageController._run() running at /home/data/maaz/H2/verificationRun/OpenWPM/openwpm/storage/storage_controller.py:352> wait_for=<Task pending name='CompletionQueueFeeder' coro=<StorageController.update_completion_queue() running at /home/data/maaz/H2/verificationRun/OpenWPM/openwpm/storage/storage_controller.py:322> wait_for=<Future finished result=None created at /home/maaz/miniconda3/envs/openwpm/lib/python3.9/asyncio/base_events.py:424> cb=[<TaskWakeupMethWrapper object at 0x7f54c0e7a460>()] created at /home/maaz/miniconda3/envs/openwpm/lib/python3.9/asyncio/tasks.py:361> cb=[_run_until_complete_cb() at /home/maaz/miniconda3/envs/openwpm/lib/python3.9/asyncio/base_events.py:184] created at /home/maaz/miniconda3/envs/openwpm/lib/python3.9/asyncio/base_events.py:621> took 0.901 seconds

Running without the xpi results in a proper shutdown.

@vringar
Copy link
Contributor

vringar commented Dec 16, 2021

This seems like a duplicate of #922.

@vringar
Copy link
Contributor

vringar commented Dec 16, 2021

Just dumping my thoughts in here:

When we collect a lot of data from a lot of browsers and then tell the Storage Controller to shut down it doesn't shut down because it gets hung up somewhere in the following code.

async def shutdown(self, completion_queue_task: Task[None]) -> None:
completion_tokens = {}
visit_ids = list(self.store_record_tasks.keys())
for visit_id in visit_ids:
t = await self.finalize_visit_id(visit_id, success=False)
if t is not None:
completion_tokens[visit_id] = t
await self.structured_storage.flush_cache()
await completion_queue_task
for visit_id, token in completion_tokens.items():
await token
self.completion_queue.put((visit_id, False))
await self.structured_storage.shutdown()
if self.unstructured_storage is not None:
await self.unstructured_storage.flush_cache()
await self.unstructured_storage.shutdown()

I think this might be an issue with completion_queue_task not finishing, when we expect it to.
Which should terminate immediately in the SQLite case, as the token should be None, so the list should be emptied in one go.

Can you add logging to this function that prints the flag and the contents of the list?

async def update_completion_queue(self) -> None:
"""All completed finalize_visit_id tasks get put into the completion_queue here"""
while not (self._shutdown_flag and len(self.finalize_tasks) == 0):
# This list is needed because iterating over a list and changing it at the same time
# is forbidden
new_finalize_tasks: List[Tuple[VisitId, Optional[Task[None]], bool]] = []
for visit_id, token, success in self.finalize_tasks:
if (
not token or token.done()
): # Either way all data for the visit_id was saved out
self.completion_queue.put((visit_id, success))
else:
new_finalize_tasks.append((visit_id, token, success))
self.finalize_tasks = new_finalize_tasks
await asyncio.sleep(5)

@vringar vringar self-assigned this Dec 20, 2021
@vringar vringar mentioned this issue Oct 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants