How to run crawler in infinite loop? #448
-
How can I run the crawler on an infinite loop while it wait for requests to come in from time to time |
Beta Was this translation helpful? Give feedback.
Answered by
vdusek
Aug 20, 2024
Replies: 1 comment
-
You can run import asyncio
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
async def main() -> None:
crawler = BeautifulSoupCrawler()
@crawler.router.default_handler
async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
context.log.info(f'Processing {context.request.url} ...')
data = {
'url': context.request.url,
'title': context.soup.title.string if context.soup.title else None,
'h1s': [h1.text for h1 in context.soup.find_all('h1')],
'h2s': [h2.text for h2 in context.soup.find_all('h2')],
'h3s': [h3.text for h3 in context.soup.find_all('h3')],
}
await context.push_data(data)
while True:
# get your URLs here
urls = ['https://crawlee.dev']
await crawler.run(urls)
await asyncio.sleep(10)
if __name__ == '__main__':
asyncio.run(main()) Converting this to discussion. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
vdusek
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You can run
crawler.run()
repeatedly in an infinite loop.