You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the following MRE scraper I intermittently get the error from the title part way through the URLs being processed:
import logging
from scrapy import Request, Spider
from scrapy.crawler import CrawlerProcess
class FlashscoreSpider(Spider):
name = "flashscore"
custom_settings = {
"DOWNLOAD_HANDLERS": {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
},
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
"REQUEST_FINGERPRINTER_IMPLEMENTATION": "2.7",
"LOG_LEVEL": logging.ERROR,
}
start_urls = [
"https://www.flashscore.com/match/WKM03Vff/#/match-summary/match-summary",
"https://www.flashscore.com/match/6go7eBHA/#/match-summary/match-summary",
"https://www.flashscore.com/match/W0rJh91T/#/match-summary/match-summary",
"https://www.flashscore.com/match/4lDXBW1i/#/match-summary/match-summary",
"https://www.flashscore.com/match/v75p9UoA/#/match-summary/match-summary",
"https://www.flashscore.com/match/4EjNNOJF/#/match-summary/match-summary",
"https://www.flashscore.com/match/rT3yPgsm/#/match-summary/match-summary",
"https://www.flashscore.com/match/hbHHSeqM/#/match-summary/match-summary",
"https://www.flashscore.com/match/pjiBe6zN/#/match-summary/match-summary",
"https://www.flashscore.com/match/KOs4lXMu/#/match-summary/match-summary",
"https://www.flashscore.com/match/UswTO2Nc/#/match-summary/match-summary",
"https://www.flashscore.com/match/OtjFfQkT/#/match-summary/match-summary",
]
def start_requests(self):
for url in self.start_urls:
yield Request(
url=url,
meta=dict(dont_redirect=True, playwright=True),
callback=self.parse,
)
def parse(self, response):
print(f"Parsing {response.url}")
if __name__ == "__main__":
process = CrawlerProcess()
process.crawl(FlashscoreSpider)
process.start()
Doesn't happen every time but when it does it usually repeats. Here's the terminal output:
Parsing https://www.flashscore.com/match/4EjNNOJF/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/6go7eBHA/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/v75p9UoA/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/4lDXBW1i/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/W0rJh91T/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/hbHHSeqM/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/rT3yPgsm/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/WKM03Vff/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/KOs4lXMu/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/UswTO2Nc/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/OtjFfQkT/#/match-summary/match-summary
Parsing https://www.flashscore.com/match/pjiBe6zN/#/match-summary/match-summary
2023-09-12 00:09:20 [asyncio] ERROR: Task was destroyed but it is pending!
task: <Task pending name='Task-4858' coro=<ScrapyPlaywrightDownloadHandler._make_request_handler.<locals>._request_handler() running at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/site-packages/scrapy_playwright/handler.py:529> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[gather.<locals>._done_callback() at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/asyncio/tasks.py:720]>
2023-09-12 00:09:20 [asyncio] ERROR: Task was destroyed but it is pending!
task: <Task pending name='Task-4857' coro=<Page._on_route() running at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/site-packages/playwright/_impl/_page.py:249> wait_for=<_GatheringFuture pending cb=[Task.task_wakeup()]> cb=[AsyncIOEventEmitter._emit_run.<locals>.callback() at /Users/<me>/opt/miniconda3/envs/capra_production/lib/python3.10/site-packages/pyee/asyncio.py:65]>
In running a longer sequence of URLs the errors appear intermittently and usually in blocks of quite a few together.
It doesn't affect further processing though - when I create items they flow into pipelines and are successfully processed there despite there being errors all over the place.
Running the following MRE scraper I intermittently get the error from the title part way through the URLs being processed:
Doesn't happen every time but when it does it usually repeats. Here's the terminal output:
In running a longer sequence of URLs the errors appear intermittently and usually in blocks of quite a few together.
It doesn't affect further processing though - when I create items they flow into pipelines and are successfully processed there despite there being errors all over the place.
Am I doing something wrong here?
python: 3.10.8
scrapy: 2.8.0
scrapy-playwright: 0.0.32
MacOS: 13.5.1
The text was updated successfully, but these errors were encountered: