Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nginx timeout for jobs that take more than 60 sec (default nginx timeout) #2

Open
januszm opened this issue Nov 6, 2024 · 2 comments
Assignees

Comments

@januszm
Copy link

januszm commented Nov 6, 2024

I'm currently migrating from Shoryuken task queue (a Sidekiq clone that uses SQS as queue store) to AWS SDK Rails, and I've encountered a strange error.
I mention Sidekiq because it's a de-facto state of the art task queue for Ruby world.
For longer jobs that require e.g. 5-10 minutes to complete, I get a timeout, and the message returns to the queue. It looks like the POST request to the application server is waiting for the task to complete, which is unacceptable for longer jobs, because some can take up to several hours. Am I doing something wrong, or is this software designed so that sqs daemon, when submitting a task to be executed using the POST method, waits for it to complete? I expected it to only wait for the task to be "accepted" for async execution, not for it to complete.

@alextwoods
Copy link

This is a tricky issue - when the worker returns a 200 ok the sqs daemon deletes the message. If the job fails, normal active record retries are applied and a new message will be queued.... but if the worker process is terminated / worker restarted, ect then the message will be entirely lost.

Additionally once a 200 ok is returned, the sqs daemon will post another request - potentially leading to overloading the worker (this would depend on how we ran the jobs async).

Given all that though - the 60 second timeout is a big issue. I beleive there are ways to configure this value to be higher (in nginx config / lb config)? I'll think on this a bit more. Possibly a configuration option that lets you process jobs async using a thread pool of some fixed/configured size?

See: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html#worker-daemon

@januszm
Copy link
Author

januszm commented Nov 7, 2024

See: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html#worker-daemon

response to acknowledge that it has received and successfully processed the request

I think I only represent certain "user group" but I'd expect this gem to respond 200 quickly when the job is accepted (ie. not wait for it to be processed), use a thread pool to process as many jobs as possible on a single server and not worry about worker failure and losing a message, this should be rare case. Job code should be responsible for handling retries in such a case and rescheduling another job with a new message. In this case I don't think me and my team can simply replace Shoryuken with this gem as it's a slightly different architecture and, again, it requires different approach, it's not a drop-in replacement. Jobs can take hours or at least dozens of minutes to complete.

@alextwoods alextwoods self-assigned this Nov 14, 2024
@mullermp mullermp transferred this issue from aws/aws-sdk-rails Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants