Skip to content
This repository has been archived by the owner on Sep 28, 2022. It is now read-only.

Fixed unfiltered duplicates bug, removed dont_filter #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

flagist0
Copy link

Middleware was emitting requests with dont_filter=True, causing multiple uncaught duplicates.

dont_filter is not needed by itself, but it was protecting request queue from exhaustion -- middleware emits one request at a time, so there is always only one request in Scrapy queue. If this request is duplicate and it is dropped by dupefilter, Scrapy request queue becomes empty and spider is closed, even if there are many requests in middleware's queue.

The solution is to catch spider_idle signal and supply next request from the queue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant