Question about reducing browser restart frequency with scrapy-playwright #311

SH-zwhy · 2024-08-08T01:57:09Z

Hi,

I'm using scrapy-playwright for data scraping, where URLs are provided through a txt file. I've noticed that every time a URL is scraped, the browser restarts, which significantly reduces scraping efficiency.

Is there a way to avoid restarting the browser for each URL or to reduce the frequency of browser restarts to improve scraping performance?

Thanks!

elacuesta · 2024-08-08T13:13:05Z

That's not the way the package works by default, you might be starting a new job for each URL. By default a new page, not browser, is created for each URL, however you can reuse pages as explained at https://github.com/scrapy-plugins/scrapy-playwright?tab=readme-ov-file#playwright_page.

Please share your code and logs as requested in the Reporting issues section.

elacuesta added needs more info support Support questions labels Aug 8, 2024

elacuesta added the Stale label Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about reducing browser restart frequency with scrapy-playwright #311

Question about reducing browser restart frequency with scrapy-playwright #311

SH-zwhy commented Aug 8, 2024

elacuesta commented Aug 8, 2024

Question about reducing browser restart frequency with scrapy-playwright #311

Question about reducing browser restart frequency with scrapy-playwright #311

Comments

SH-zwhy commented Aug 8, 2024

elacuesta commented Aug 8, 2024