Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xmlrpc: don't re-raise the exception when retry fails #258

Closed
wants to merge 1 commit into from

Conversation

rhyw
Copy link
Contributor

@rhyw rhyw commented Mar 29, 2024

@rhyw
Copy link
Contributor Author

rhyw commented Mar 29, 2024

Test output:

# gaierror(specify hub without VPN)
osh-cli mock-build --hub=<osh-hub-xmlrpc-endpoint> --brew-build units-2.21-4.fc36
XML-RPC connection to osh-hub-xmlrpc-endpoint failed: ('Name or service not known',), 5 retries left
XML-RPC connection to osh-hub-xmlrpc-endpoint failed: ('Name or service not known',), 4 retries left
XML-RPC connection to osh-hub-xmlrpc-endpoint failed: ('Name or service not known',), 3 retries left
XML-RPC connection to osh-hub-xmlrpc-endpoint failed: ('Name or service not known',), 2 retries left
XML-RPC connection to osh-hub-xmlrpc-endpoint failed: ('Name or service not known',), 1 retry left
XML-RPC connection to osh-hub-xmlrpc-endpoint failed: gaierror - [Errno -2] Name or service not known

# ProtocolError
osh-cli mock-build --hub=http://example.com/xmlrpc units-2.21-4.fc36.src.rpm
XML-RPC connection to example.com failed: (), 5 retries left
XML-RPC connection to example.com failed: (), 4 retries left
XML-RPC connection to example.com failed: (), 3 retries left
XML-RPC connection to example.com failed: (), 2 retries left
XML-RPC connection to example.com failed: (), 1 retry left
XML-RPC connection to example.com failed: ProtocolError - <ProtocolError for example.com/xmlrpc/client/: 404 Not Found>

kobo/xmlrpc.py Outdated
@@ -481,7 +481,8 @@ def request(self, *args, **kwargs):
except (socket.error, socket.herror, socket.gaierror, socket.timeout,
httplib.CannotSendRequest, xmlrpclib.ProtocolError) as ex:
if i >= self.retry_count:
raise
print(f"XML-RPC connection to {args[0]} failed: {ex}", file=sys.stderr)
sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not expect retry_request_decorator() to terminate the process when a connection fails. That feels totally wrong. If such behavior is desired, the re-raised exception needs to be caught again at a higher level and handled as needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exceptions occur when initializing a hub proxy. I'm doubting whether it's worth a fix at all since it needs to quit somewhere even at a higher level.

I reverted this and moved the exception handling upper.

@rhyw
Copy link
Contributor Author

rhyw commented Apr 2, 2024

The HubProxy initialization happens also in module kobo/hub/worker/taskmanager.py. I didn't update there since I think the tracebacks are not visible to users of the client. Let me know if you prefer a different way.

Please take a look.

@kdudka
Copy link
Contributor

kdudka commented Apr 2, 2024

@rhyw Using sys.exit() in kobo/client is certainly better than using it in retry_request_decorator(). But why cannot this be handled even higher?

kobo is a generic framework and different applications based on kobo might have different needs for error handling. How exactly looks the traceback that you are trying to avoid by this change?

@rhyw
Copy link
Contributor Author

rhyw commented Apr 3, 2024

@rhyw Using sys.exit() in kobo/client is certainly better than using it in retry_request_decorator(). But why cannot this be handled even higher?

Do you have a better idea which is a better/higher place? I think it happens during hub proxy inialization and after the retries failed.

kobo is a generic framework and different applications based on kobo might have different needs for error handling. How exactly looks the traceback that you are trying to avoid by this change?

This is try to handle the exception when hub is unreachable due to various connection issues. See issue description in https://issues.redhat.com/browse/OSH-144

@kdudka
Copy link
Contributor

kdudka commented Apr 3, 2024

@rhyw Please show me the traceback and I will tell you where I think the exception could be caught. I cannot see any traceback in the description of https://issues.redhat.com/browse/OSH-144 though.

@rhyw
Copy link
Contributor Author

rhyw commented Apr 3, 2024

@rhyw Please show me the traceback and I will tell you where I think the exception could be caught. I cannot see any traceback in the description of https://issues.redhat.com/browse/OSH-144 though.

See for example deprecated issue 100:

XML-RPC connection to localhost failed: (), 5 retries left
XML-RPC connection to localhost failed: (), 4 retries left
XML-RPC connection to localhost failed: (), 3 retries left
XML-RPC connection to localhost failed: (), 2 retries left
XML-RPC connection to localhost failed: (), 1 retry left
Traceback (most recent call last):
  File "/usr/bin/covscan", line 59, in <module>
    sys.exit(main())
  File "/usr/bin/covscan", line 51, in main
    parser.run(args)
  File "/usr/lib/python3.6/site-packages/kobo/cli.py", line 292, in run
    cmd.run(*cmd_args, **cmd_kwargs)
  File "/usr/lib/python3.6/site-packages/covscan/commands/cmd_diff_build.py", line 227, in run
    task_id = self.submit_task(config, comment, options)
  File "/usr/lib/python3.6/site-packages/covscan/commands/cmd_mock_build.py", line 24, in submit_task
    return self.hub.scan.mock_build(config, comment, options)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request
    verbose=self.__verbose
  File "/usr/lib/python3.6/site-packages/kobo/xmlrpc.py", line 629, in request
    result = transport_class.request(self, *args, **kwargs)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.6/site-packages/kobo/xmlrpc.py", line 535, in _single_request3
    raise xmlrpclib.ProtocolError(host + handler, response.status, response.reason, response.msg)
xmlrpc.client.ProtocolError: <ProtocolError for localhost/covscanhub/xmlrpc/client/: 504 Gateway Timeout>

@kdudka
Copy link
Contributor

kdudka commented Apr 3, 2024

@rhyw I think the place where sys.exit() should be called is here: https://github.com/openscanhub/openscanhub/blob/f5ccf9619c07586f1c0b6ba8cdcc5b0ba81a3805/osh/client/osh-cli#L77

For example xmlrpc.client.ProtocolError can be caught directly at the top level. I understand that exceptions raised from the DNS or TCP level might be too generic to provide an accurate enough error message for the user to understand what happened.

What about the following approach?

  1. Create a custom Exception class in kobo for failed XML-RPC connections.
  2. When an exception is caught while doing an XML-RPC connection in kobo, wrap it by the the custom exception and rethrow.
  3. The application may catch the custom exception and handle it as needed.

The disadvantage is that it will break applications that are catching socket.gaierror and the like. Still I think it is better than calling sys.exit() unconditionally from a library.

@rhyw
Copy link
Contributor Author

rhyw commented Apr 3, 2024

Either way, there should be somewhere to exit the process. Raising exceptions expose the tracebacks to cli, which should be fixed if possible.

I don't see how sys.exit() shouldn't be used since there are several retries earlier that all failed.

What about the following approach?

  1. Create a custom Exception class in kobo for failed XML-RPC connections.
    The disadvantage is that it will break applications that are catching socket.gaierror and the like. Still I think it is better than calling sys.exit() unconditionally from a library.

yeah it may have various exceptions, it's better to preserve original exception type/message such that we have more clues of the error. I'd rather do nothing in kobo(keep it as it is) and catch exceptions in OSH cli directly.

WDYT?

@kdudka
Copy link
Contributor

kdudka commented Apr 3, 2024

I don't see how sys.exit() shouldn't be used since there are several retries earlier that all failed.

Calling sys.exit() in osh-cli is fine. Different applications might have different needs. Imagine you have a GUI application that uses kobo to do a XML-RPC call when you click a button. If it fails to connect, you would like show a dialog with an error message, rather than terminating (from user's point of view crashing) the whole GUI application.

I'd rather do nothing in kobo(keep it as it is) and catch exceptions in OSH cli directly.

That is exactly what I had in mind while creating OSH-144. The downside is that when you catch socket.gaierror, you do not know where it was raised from. It could be a connection to OSH hub, Koji, or something else. My impression was that you were trying to distinguish these cases.

@rhyw
Copy link
Contributor Author

rhyw commented Apr 3, 2024

That is exactly what I had in mind while creating OSH-144. The downside is that when you catch socket.gaierror, you do not know where it was raised from. It could be a connection to OSH hub, Koji, or something else. My impression was that you were trying to distinguish these cases.

I don't have a strong sense of distinguish the cases. I was attempting to handle exception where it was originall raised. Catch in osh cli would be much simpler although it could be too generic.

I'll close this MR and upload an MR in openscanhub repo instead. Thanks for the suggestions!

@rhyw
Copy link
Contributor Author

rhyw commented Apr 3, 2024

closing in favor of openscanhub/openscanhub#250

@rhyw rhyw closed this Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants