Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure committer queue uniqueness to avoid queue collisions #9

Open
essiembre opened this issue Apr 9, 2015 · 1 comment
Open

Ensure committer queue uniqueness to avoid queue collisions #9

essiembre opened this issue Apr 9, 2015 · 1 comment
Milestone

Comments

@essiembre
Copy link
Contributor

Most Committers extend AbstractFileQueueCommitter. When multiple committers are used by multiple processes sharing the same working directory, the default queue directory can be the same. This results in two committers processing the same files. That's not ideal.

We should find a way to enforce uniqueness of committer queues, while having them predictable (so the same committer instance always point to the same location).

A real case for this issue is best described in Norconex/crawlers#67.

When used with Norconex Collectors, implicitly passing the collector ID and crawler ID (which is already a unique combo) and using that to create a unique directory would do it, but Committers are not tied to Collectors right now, so we can't assume we'll always have these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant