[asyncio] ERROR: Task was destroyed but it is pending! #233

gottogethelp · 2023-09-12T07:22:16Z

Running the following MRE scraper I intermittently get the error from the title part way through the URLs being processed:

import logging

from scrapy import Request, Spider
from scrapy.crawler import CrawlerProcess


class FlashscoreSpider(Spider):
    name = "flashscore"
    custom_settings = {
        "DOWNLOAD_HANDLERS": {
            "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
            "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
        },
        "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
        "REQUEST_FINGERPRINTER_IMPLEMENTATION": "2.7",
        "LOG_LEVEL": logging.ERROR,
    }

    start_urls = [
        "https://www.flashscore.com/match/WKM03Vff/#/match-summary/match-summary",
        "https://www.flashscore.com/match/6go7eBHA/#/match-summary/match-summary",
        "https://www.flashscore.com/match/W0rJh91T/#/match-summary/match-summary",
        "https://www.flashscore.com/match/4lDXBW1i/#/match-summary/match-summary",
        "https://www.flashscore.com/match/v75p9UoA/#/match-summary/match-summary",
        "https://www.flashscore.com/match/4EjNNOJF/#/match-summary/match-summary",
        "https://www.flashscore.com/match/rT3yPgsm/#/match-summary/match-summary",
        "https://www.flashscore.com/match/hbHHSeqM/#/match-summary/match-summary",
        "https://www.flashscore.com/match/pjiBe6zN/#/match-summary/match-summary",
        "https://www.flashscore.com/match/KOs4lXMu/#/match-summary/match-summary",
        "https://www.flashscore.com/match/UswTO2Nc/#/match-summary/match-summary",
        "https://www.flashscore.com/match/OtjFfQkT/#/match-summary/match-summary",
    ]

    def start_requests(self):
        for url in self.start_urls:
            yield Request(
                url=url,
                meta=dict(dont_redirect=True, playwright=True),
                callback=self.parse,
            )

    def parse(self, response):
        print(f"Parsing {response.url}")


if __name__ == "__main__":
    process = CrawlerProcess()
    process.crawl(FlashscoreSpider)
    process.start()

Doesn't happen every time but when it does it usually repeats. Here's the terminal output:

Parsing https://www.flashscore.com/match/4EjNNOJF/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/6go7eBHA/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/v75p9UoA/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/4lDXBW1i/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/W0rJh91T/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/hbHHSeqM/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/rT3yPgsm/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/WKM03Vff/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/KOs4lXMu/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/UswTO2Nc/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/OtjFfQkT/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/pjiBe6zN/#/match-summary/match-summary
2023-09-12 00:09:20 [asyncio] ERROR: Task was destroyed but it is pending!
task: <Task pending name='Task-4858' coro=<ScrapyPlaywrightDownloadHandler._make_request_handler.<locals>._request_handler() running at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/site-packages/scrapy_playwright/handler.py:529> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[gather.<locals>._done_callback() at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/asyncio/tasks.py:720]>
2023-09-12 00:09:20 [asyncio] ERROR: Task was destroyed but it is pending!
task: <Task pending name='Task-4857' coro=<Page._on_route() running at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/site-packages/playwright/_impl/_page.py:249> wait_for=<_GatheringFuture pending cb=[Task.task_wakeup()]> cb=[AsyncIOEventEmitter._emit_run.<locals>.callback() at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/site-packages/pyee/asyncio.py:65]>

In running a longer sequence of URLs the errors appear intermittently and usually in blocks of quite a few together.

It doesn't affect further processing though - when I create items they flow into pipelines and are successfully processed there despite there being errors all over the place.

Am I doing something wrong here?

python: 3.10.8
scrapy: 2.8.0
scrapy-playwright: 0.0.32
MacOS: 13.5.1

The text was updated successfully, but these errors were encountered:

elacuesta · 2023-09-12T12:07:35Z

Indeed, I can reproduce. This was already reported at #188 but it was lacking a reproducible example, let's continue the discussion at #188.

elacuesta closed this as not planned Won't fix, can't repro, duplicate, stale Sep 12, 2023

This was referenced Sep 12, 2023

ERROR: Task was destroyed but it is pending! #188

Open

AsyncIOEventEmitter should keep a reference to scheduled futures jfhbrook/pyee#120

Closed

elacuesta added the duplicate This issue or pull request already exists label Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[asyncio] ERROR: Task was destroyed but it is pending! #233

[asyncio] ERROR: Task was destroyed but it is pending! #233

gottogethelp commented Sep 12, 2023 •

edited

Loading

elacuesta commented Sep 12, 2023

[asyncio] ERROR: Task was destroyed but it is pending! #233

[asyncio] ERROR: Task was destroyed but it is pending! #233

Comments

gottogethelp commented Sep 12, 2023 • edited Loading

elacuesta commented Sep 12, 2023

gottogethelp commented Sep 12, 2023 •

edited

Loading