Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running docker container doesn't work #918

Open
vexleet opened this issue Jan 18, 2025 · 3 comments
Open

Running docker container doesn't work #918

vexleet opened this issue Jan 18, 2025 · 3 comments
Assignees
Labels
t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@vexleet
Copy link

vexleet commented Jan 18, 2025

Hi.

I have the playwright crawler example https://crawlee.dev/python/docs/examples/playwright-crawler and want to run it with docker, but when I try, I get RecursionError: maximum recursion depth exceeded.

This is my dockerfile:

FROM python:3.9.20

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
    pip install 'crawlee[playwright]' && \
    playwright install --with-deps chromium

COPY . .

ENTRYPOINT ["python", "main.py"]

And this is the full error I get:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/crawlee/storage_clients/_memory/_memory_storage_client.py", line 326, in _batch_remove_files
    await asyncio.to_thread(os.rename, folder, temporary_folder)
  File "/usr/local/lib/python3.9/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
OSError: [Errno 18] Invalid cross-device link: './storage/datasets/default' -> 'storage/datasets/__CRAWLEE_TEMPORARY_974'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/crawlee/storage_clients/_memory/_memory_storage_client.py", line 326, in _batch_remove_files
    await asyncio.to_thread(os.rename, folder, temporary_folder)
  File "/usr/local/lib/python3.9/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 819, in run_in_executor
    executor.submit(func, *args), loop=self)
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 172, in submit
    f = _base.Future()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 320, in __init__
    self._condition = threading.Condition()
  File "/usr/local/lib/python3.9/threading.py", line 230, in __init__
    lock = RLock()
  File "/usr/local/lib/python3.9/threading.py", line 93, in RLock
    return _CRLock(*args, **kwargs)
RecursionError: maximum recursion depth exceeded while calling a Python object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/main.py", line 63, in <module>
    asyncio.run(main())
  File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/app/main.py", line 59, in main
    await crawler.run(['https://news.ycombinator.com/'])
  File "/usr/local/lib/python3.9/site-packages/crawlee/crawlers/_basic/_basic_crawler.py", line 466, in run
    await self.add_requests(requests)
  File "/usr/local/lib/python3.9/site-packages/crawlee/crawlers/_basic/_basic_crawler.py", line 555, in add_requests
    request_manager = await self.get_request_manager()
  File "/usr/local/lib/python3.9/site-packages/crawlee/crawlers/_basic/_basic_crawler.py", line 398, in get_request_manager
    self._request_manager = await RequestQueue.open()
  File "/usr/local/lib/python3.9/site-packages/crawlee/storages/_request_queue.py", line 165, in open
    return await open_storage(
  File "/usr/local/lib/python3.9/site-packages/crawlee/storages/_creation_management.py", line 154, in open_storage
    await storage_client.purge_on_start()
  File "/usr/local/lib/python3.9/site-packages/crawlee/storage_clients/_memory/_memory_storage_client.py", line 182, in purge_on_start
    await self._purge_default_storages()
  File "/usr/local/lib/python3.9/site-packages/crawlee/storage_clients/_memory/_memory_storage_client.py", line 246, in _purge_default_storages
    await self._batch_remove_files(dataset_folder.path)
  File "/usr/local/lib/python3.9/site-packages/crawlee/storage_clients/_memory/_memory_storage_client.py", line 329, in _batch_remove_files
    return await self._batch_remove_files(folder, counter + 1)
  File "/usr/local/lib/python3.9/site-packages/crawlee/storage_clients/_memory/_memory_storage_client.py", line 329, in _batch_remove_files
    return await self._batch_remove_files(folder, counter + 1)
  File "/usr/local/lib/python3.9/site-packages/crawlee/storage_clients/_memory/_memory_storage_client.py", line 329, in _batch_remove_files
    return await self._batch_remove_files(folder, counter + 1)
  [Previous line repeated 973 more times]
  File "/usr/local/lib/python3.9/site-packages/crawlee/storage_clients/_memory/_memory_storage_client.py", line 315, in _batch_remove_files
    folder_exists = await asyncio.to_thread(os.path.exists, folder)
  File "/usr/local/lib/python3.9/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 819, in run_in_executor
    executor.submit(func, *args), loop=self)
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 172, in submit
    f = _base.Future()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 320, in __init__
    self._condition = threading.Condition()
  File "/usr/local/lib/python3.9/threading.py", line 230, in __init__
    lock = RLock()
RecursionError: maximum recursion depth exceeded
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 18, 2025
@Mantisus
Copy link
Collaborator

Hey @vexleet

Thank you for your interest in crawlee.

Are there any additional details or nuances on how you do the container startup?

Since I have been using your dockerfile and am getting errors related to running as root user. But nothing like the above error log

@vexleet
Copy link
Author

vexleet commented Jan 21, 2025

Hey @Mantisus

I tried reproducing the problem again after your comment, but I also get root user errors on version 0.5.2 but on 0.5.1 (the one I have been actually using) it's working, though I had my PC fully restarted on Sunday.

I switched to crawlee JS after creating the issue, but I can try running my old python code in the weekend to see if I can reproduce the issue, since I had another code before trying with a crawlee example and both were returning the error above.

I hope that's fine.

If I can't manage, I can just close the issue.

@Mantisus
Copy link
Collaborator

I hope that's fine.

Yes, that would be great. Thanks )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

3 participants