Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that client supports any number of virtual cloud providers #24

Closed
strieflin opened this issue May 6, 2022 · 7 comments · Fixed by #25
Closed

Ensure that client supports any number of virtual cloud providers #24

strieflin opened this issue May 6, 2022 · 7 comments · Fixed by #25
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@strieflin
Copy link
Member

The ephemeral client uses Java Parallel Streams (JPS) to interact with the virtual cloud providers (VCPs). By default JPS allocates n threads on a n core machine. As the HTTP calls are blocking, this results in a timeout on a system with less cores than the number of VCPs as the distributed execution in the backend will only kick-off after all VCPs have received an invocation request.

@strieflin strieflin added the kind/bug Categorizes issue or PR as related to a bug. label May 6, 2022
@strieflin strieflin self-assigned this May 6, 2022
@strieflin
Copy link
Member Author

Hi @grafjo, @kindlich, have you verified that this does not work or is it just an assumption inferred from the code. We use the Vavr Future implementation that uses a dynamic thread pool with up to 32767 backing threads, which should not result in issues for any reasonable setting. I have tried to reproduce the issue with a unit test without success.

@grafjo
Copy link
Contributor

grafjo commented May 6, 2022

@strieflin we can reproduce this behavior in the wild with small sized virtual machines - e.g. 2 cpu cores.

Varv Future is using ForkJoinPool and that one is configured by the available processors / cpu cores

"For applications that require separate or custom pools, a ForkJoinPool may be constructed with a given target parallelism level; by default, equal to the number of available processors.

@kindlich
Copy link
Contributor

kindlich commented May 6, 2022

In our test case, we were runninng with:

  • 3 Clusters
  • Custom Java Spring-Boot application that delegated to the Ephemeral Java-Client
  • A google e2-standard-2 VM as a K8s Node
    • 2 vCPUs
    • 8GB RAM

In there, it would issue 1 HTTP Request and not more.

Since Vavr by default delegates to the Java ForkJoinPool.commonPool(), the same as for Java Streams would apply.
This is either Number of CPUs or Number of (CPUs - 1), I think it the latter, actually.

@strieflin
Copy link
Member Author

From Vavr documentation:

The ForkJoinPool is dynamic, it has a maximum of 32767 running threads. Compared to fixed-size pools, this reduces the risk of dead-locks.

So that suggests that it shouldn't be an issue.

However, the following snippet does not terminate when using count >= Runtime.getRuntime().availableProcessors(), which supports your observation.

int count = Runtime.getRuntime().availableProcessors();
CyclicBarrier b = new CyclicBarrier(count);
Future.sequence(Stream.range(0, count).map(i -> Future.of(() -> b.await())).toJavaList()).await();

Will continue trying to replicate using EphemeralMultiClient.

@kindlich
Copy link
Contributor

kindlich commented May 6, 2022

If you want to test with less, you can use the JVM arguments to specify the processor count
-XX:ActiveProcessorCount=1

Example:

  • Lokal (kind) Setup: 2 Clusters
  • java -XX:ActiveProcessorCount=4 -jar cs.jar ephemeral execute ...
  • java -XX:ActiveProcessorCount=2 -jar cs.jar ephemeral execute ...

Error would be:

An error ocurred while executing the given command:
Failed to trigger the backend Ephemeral service: status code: 500, message: error while talking to Discovery: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
io.carbynestack.cli.exceptions.CsCliRunnerException: Failed to trigger the backend Ephemeral service: status code: 500, message: error while talking to Discovery: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
        at io.carbynestack.cli.client.ephemeral.command.ExecuteEphemeralClientCliCommandRunner.run(ExecuteEphemeralClientCliCommandRunner.java:60) ~[cs.jar:?]
        at io.carbynestack.cli.CsClientCli.execute(CsClientCli.java:151) ~[cs.jar:?]
        at io.carbynestack.cli.CsClientCli.parseConfig(CsClientCli.java:107) ~[cs.jar:?]
        at io.carbynestack.cli.CsClientCli.parse(CsClientCli.java:72) ~[cs.jar:?]
        at io.carbynestack.cli.CsCliApplication.run(CsCliApplication.java:117) ~[cs.jar:?]
        at io.carbynestack.cli.CsCliApplication.main(CsCliApplication.java:161) [cs.jar:?]
An error ocurred while executing the given command:
Failed to trigger the backend Ephemeral service: status code: 500, message: error while talking to Discovery: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
io.carbynestack.cli.exceptions.CsCliRunnerException: Failed to trigger the backend Ephemeral service: status code: 500, message: error while talking to Discovery: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
        at io.carbynestack.cli.client.ephemeral.command.ExecuteEphemeralClientCliCommandRunner.run(ExecuteEphemeralClientCliCommandRunner.java:60) ~cs.jar:?
        at io.carbynestack.cli.CsClientCli.execute(CsClientCli.java:151) ~cs.jar:?
        at io.carbynestack.cli.CsClientCli.parseConfig(CsClientCli.java:107) ~cs.jar:?
        at io.carbynestack.cli.CsClientCli.parse(CsClientCli.java:72) ~cs.jar:?
        at io.carbynestack.cli.CsCliApplication.run(CsCliApplication.java:117) ~cs.jar:?
        at io.carbynestack.cli.CsCliApplication.main(CsCliApplication.java:161) cs.jar:?

@strieflin strieflin linked a pull request May 6, 2022 that will close this issue
@kindlich
Copy link
Contributor

kindlich commented May 9, 2022

Do we need similar changes in Amphora/Castor?

  • I think Castor should be fine, since it does not upload/activate tuples concurrently.
  • For the Amphora Client, uploading Secrets and retrieving/unveiling Secrets may need another look?

@strieflin
Copy link
Member Author

Do we need similar changes in Amphora/Castor?

* I think Castor should be fine, since it does not upload/activate tuples concurrently.

* For the Amphora Client, uploading Secrets and retrieving/unveiling Secrets may need another look?

Added an issued to verify Amphora works as expected (see carbynestack/amphora#32).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants