You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most Committers extend AbstractFileQueueCommitter. When multiple committers are used by multiple processes sharing the same working directory, the default queue directory can be the same. This results in two committers processing the same files. That's not ideal.
We should find a way to enforce uniqueness of committer queues, while having them predictable (so the same committer instance always point to the same location).
When used with Norconex Collectors, implicitly passing the collector ID and crawler ID (which is already a unique combo) and using that to create a unique directory would do it, but Committers are not tied to Collectors right now, so we can't assume we'll always have these.
The text was updated successfully, but these errors were encountered:
Most Committers extend AbstractFileQueueCommitter. When multiple committers are used by multiple processes sharing the same working directory, the default queue directory can be the same. This results in two committers processing the same files. That's not ideal.
We should find a way to enforce uniqueness of committer queues, while having them predictable (so the same committer instance always point to the same location).
A real case for this issue is best described in Norconex/crawlers#67.
When used with Norconex Collectors, implicitly passing the collector ID and crawler ID (which is already a unique combo) and using that to create a unique directory would do it, but Committers are not tied to Collectors right now, so we can't assume we'll always have these.
The text was updated successfully, but these errors were encountered: