Scale scraper to multiple threads #133
Labels
backend
Deals with the FastAPI and web-scraping backend
enhancement
Adds value to a previous feature
scraping
Involves web scraping
Milestone
Context
Right now, our scraper's speed is heavily limited since we are only using one! Allowing the scraper to scale to N threads will dramatically increase performance.
TODO
-n
that is an integer representing the number of threads that the scraper scales toc
companies will spawnn
threads and assign themc / n
companies to divide the work evenly).Notes
Be careful of race conditions and whatnot here! Multi-threading adds a lot of additional complexity and bugs that one might initially overlook. Some more ideas for improving the speed would have a scraper scrape other sites while it waits for the crawl delay to expire on a different company site (i.e. asynchronous requests).
The text was updated successfully, but these errors were encountered: