How to run crawler in infinite loop? #448

marisancans · 2024-08-17T18:34:18Z

marisancans
Aug 17, 2024

How can I run the crawler on an infinite loop while it wait for requests to come in from time to time

Aug 20, 2024

You can run crawler.run() repeatedly in an infinite loop.

import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
    crawler = BeautifulSoupCrawler()

    @crawler.router.default_handler
    async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
        context.log.info(f'Processing {context.request.url} ...')

        data = {
            'url': context.request.url,
            'title': context.soup.title.string if context.soup.title else None,
            'h1s': [h1.text for h1 in context.soup.find_all('h1')],
            'h2s': [h2.text for h2 in context.soup.find_all('h2')],
            …

View full answer

vdusek · 2024-08-20T16:37:49Z

vdusek
Aug 20, 2024
Maintainer

You can run crawler.run() repeatedly in an infinite loop.

import asyncio

from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext


async def main() -> None:
    crawler = BeautifulSoupCrawler()

    @crawler.router.default_handler
    async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
        context.log.info(f'Processing {context.request.url} ...')

        data = {
            'url': context.request.url,
            'title': context.soup.title.string if context.soup.title else None,
            'h1s': [h1.text for h1 in context.soup.find_all('h1')],
            'h2s': [h2.text for h2 in context.soup.find_all('h2')],
            'h3s': [h3.text for h3 in context.soup.find_all('h3')],
        }

        await context.push_data(data)

    while True:
        # get your URLs here
        urls = ['https://crawlee.dev']
        await crawler.run(urls)
        await asyncio.sleep(10)

if __name__ == '__main__':
    asyncio.run(main())

Converting this to discussion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run crawler in infinite loop? #448

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to run crawler in infinite loop? #448

marisancans Aug 17, 2024

Replies: 1 comment

vdusek Aug 20, 2024 Maintainer

marisancans
Aug 17, 2024

vdusek
Aug 20, 2024
Maintainer