Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about reducing browser restart frequency with scrapy-playwright #311

Open
SH-zwhy opened this issue Aug 8, 2024 · 1 comment
Open
Labels

Comments

@SH-zwhy
Copy link

SH-zwhy commented Aug 8, 2024

Hi,

I'm using scrapy-playwright for data scraping, where URLs are provided through a txt file. I've noticed that every time a URL is scraped, the browser restarts, which significantly reduces scraping efficiency.

Is there a way to avoid restarting the browser for each URL or to reduce the frequency of browser restarts to improve scraping performance?

Thanks!

@elacuesta
Copy link
Member

That's not the way the package works by default, you might be starting a new job for each URL. By default a new page, not browser, is created for each URL, however you can reuse pages as explained at https://github.com/scrapy-plugins/scrapy-playwright?tab=readme-ov-file#playwright_page.

Please share your code and logs as requested in the Reporting issues section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants