Ensure that client supports any number of virtual cloud providers #24

strieflin · 2022-05-06T09:24:28Z

The ephemeral client uses Java Parallel Streams (JPS) to interact with the virtual cloud providers (VCPs). By default JPS allocates n threads on a n core machine. As the HTTP calls are blocking, this results in a timeout on a system with less cores than the number of VCPs as the distributed execution in the backend will only kick-off after all VCPs have received an invocation request.

The text was updated successfully, but these errors were encountered:

strieflin · 2022-05-06T11:02:26Z

Hi @grafjo, @kindlich, have you verified that this does not work or is it just an assumption inferred from the code. We use the Vavr Future implementation that uses a dynamic thread pool with up to 32767 backing threads, which should not result in issues for any reasonable setting. I have tried to reproduce the issue with a unit test without success.

grafjo · 2022-05-06T11:19:13Z

@strieflin we can reproduce this behavior in the wild with small sized virtual machines - e.g. 2 cpu cores.

Varv Future is using ForkJoinPool and that one is configured by the available processors / cpu cores

"For applications that require separate or custom pools, a ForkJoinPool may be constructed with a given target parallelism level; by default, equal to the number of available processors.

kindlich · 2022-05-06T11:20:21Z

In our test case, we were runninng with:

3 Clusters
Custom Java Spring-Boot application that delegated to the Ephemeral Java-Client
A google e2-standard-2 VM as a K8s Node
- 2 vCPUs
- 8GB RAM

In there, it would issue 1 HTTP Request and not more.

Since Vavr by default delegates to the Java ForkJoinPool.commonPool(), the same as for Java Streams would apply.
This is either Number of CPUs or Number of (CPUs - 1), I think it the latter, actually.

strieflin · 2022-05-06T11:52:34Z

From Vavr documentation:

The ForkJoinPool is dynamic, it has a maximum of 32767 running threads. Compared to fixed-size pools, this reduces the risk of dead-locks.

So that suggests that it shouldn't be an issue.

However, the following snippet does not terminate when using count >= Runtime.getRuntime().availableProcessors(), which supports your observation.

int count = Runtime.getRuntime().availableProcessors();
CyclicBarrier b = new CyclicBarrier(count);
Future.sequence(Stream.range(0, count).map(i -> Future.of(() -> b.await())).toJavaList()).await();

Will continue trying to replicate using EphemeralMultiClient.

kindlich · 2022-05-06T12:22:02Z

If you want to test with less, you can use the JVM arguments to specify the processor count
-XX:ActiveProcessorCount=1

Example:

Lokal (kind) Setup: 2 Clusters
✅ java -XX:ActiveProcessorCount=4 -jar cs.jar ephemeral execute ...
❌ java -XX:ActiveProcessorCount=2 -jar cs.jar ephemeral execute ...

Error would be:

An error ocurred while executing the given command:
Failed to trigger the backend Ephemeral service: status code: 500, message: error while talking to Discovery: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
io.carbynestack.cli.exceptions.CsCliRunnerException: Failed to trigger the backend Ephemeral service: status code: 500, message: error while talking to Discovery: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
        at io.carbynestack.cli.client.ephemeral.command.ExecuteEphemeralClientCliCommandRunner.run(ExecuteEphemeralClientCliCommandRunner.java:60) ~[cs.jar:?]
        at io.carbynestack.cli.CsClientCli.execute(CsClientCli.java:151) ~[cs.jar:?]
        at io.carbynestack.cli.CsClientCli.parseConfig(CsClientCli.java:107) ~[cs.jar:?]
        at io.carbynestack.cli.CsClientCli.parse(CsClientCli.java:72) ~[cs.jar:?]
        at io.carbynestack.cli.CsCliApplication.run(CsCliApplication.java:117) ~[cs.jar:?]
        at io.carbynestack.cli.CsCliApplication.main(CsCliApplication.java:161) [cs.jar:?]
An error ocurred while executing the given command:
Failed to trigger the backend Ephemeral service: status code: 500, message: error while talking to Discovery: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
io.carbynestack.cli.exceptions.CsCliRunnerException: Failed to trigger the backend Ephemeral service: status code: 500, message: error while talking to Discovery: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
        at io.carbynestack.cli.client.ephemeral.command.ExecuteEphemeralClientCliCommandRunner.run(ExecuteEphemeralClientCliCommandRunner.java:60) ~cs.jar:?
        at io.carbynestack.cli.CsClientCli.execute(CsClientCli.java:151) ~cs.jar:?
        at io.carbynestack.cli.CsClientCli.parseConfig(CsClientCli.java:107) ~cs.jar:?
        at io.carbynestack.cli.CsClientCli.parse(CsClientCli.java:72) ~cs.jar:?
        at io.carbynestack.cli.CsCliApplication.run(CsCliApplication.java:117) ~cs.jar:?
        at io.carbynestack.cli.CsCliApplication.main(CsCliApplication.java:161) cs.jar:?

kindlich · 2022-05-09T06:22:36Z

Do we need similar changes in Amphora/Castor?

I think Castor should be fine, since it does not upload/activate tuples concurrently.
For the Amphora Client, uploading Secrets and retrieving/unveiling Secrets may need another look?

strieflin · 2022-05-09T14:47:31Z

Do we need similar changes in Amphora/Castor?

* I think Castor should be fine, since it does not upload/activate tuples concurrently.

* For the Amphora Client, uploading Secrets and retrieving/unveiling Secrets may need another look?

Added an issued to verify Amphora works as expected (see carbynestack/amphora#32).

strieflin added the kind/bug Categorizes issue or PR as related to a bug. label May 6, 2022

strieflin self-assigned this May 6, 2022

strieflin linked a pull request May 6, 2022 that will close this issue

Fix concurrency issue in client that creates a deadlock #25

Merged

strieflin closed this as completed in #25 May 9, 2022

strieflin mentioned this issue May 9, 2022

Check whether Amphora client support > 2 VCPs carbynestack/amphora#32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure that client supports any number of virtual cloud providers #24

Ensure that client supports any number of virtual cloud providers #24

strieflin commented May 6, 2022

strieflin commented May 6, 2022

grafjo commented May 6, 2022

kindlich commented May 6, 2022

strieflin commented May 6, 2022

kindlich commented May 6, 2022

kindlich commented May 9, 2022

strieflin commented May 9, 2022

Ensure that client supports any number of virtual cloud providers #24

Ensure that client supports any number of virtual cloud providers #24

Comments

strieflin commented May 6, 2022

strieflin commented May 6, 2022

grafjo commented May 6, 2022

kindlich commented May 6, 2022

strieflin commented May 6, 2022

kindlich commented May 6, 2022

kindlich commented May 9, 2022

strieflin commented May 9, 2022