Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] [Security Manager Replacement] GraalVM sandboxing #16863

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

reta
Copy link
Collaborator

@reta reta commented Dec 16, 2024

Description

Use GraalVM capability to spin off a separate JVM to host the sandboxed component. With this model, it becomes possible to:

  • run OpenSearch core on any JVM that GraalVM supports (without SM)
  • use older JDK versions (up to 23) with SM enabled for non-trusted components

The POC does a bare minimum work to host the ShiroIdentityPlugin in the separated JVM (21.0.5+11-Ubuntu-1ubuntu124.10) that is running under SecurityManager:

OpenSearch home: /home/opensearch-3.0.0-SNAPSHOT
Host JVM version: 22.0.2+9-jvmci-b01
Polyglot JVM version: 21.0.5+11-Ubuntu-1ubuntu124.10
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.Security (file:/home/opensearch-3.0.0-SNAPSHOT/lib/opensearch-3.0.0-SNAPSHOT.jar)
WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.Security
WARNING: System::setSecurityManager will be removed in a future release
Security Manager? org.opensearch.secure_sm.SecureSM@7b303608
[2024-12-16T15:00:18,828][WARN ][stderr                   ] [] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
[2024-12-16T15:00:18,864][WARN ][stderr                   ] [] SLF4J: Defaulting to no-operation (NOP) logger implementation
[2024-12-16T15:00:18,875][WARN ][stderr                   ] [] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Shiro Plugin? org.opensearch.identity.shiro.ShiroIdentityPlugin@5ddd3150
Exception in thread "main" access denied ("java.net.SocketPermission" "localhost:0" "listen,resolve")
	at <java> checkPermission(Ljava/security/Permission;)V(Unknown)
	at <java> checkPermission(Ljava/security/Permission;)V(java/security/AccessController.java:1071:0)
	at <java> checkPermission(Ljava/security/Permission;)V(java/lang/SecurityManager.java:411:0)
	at <java> checkListen(I)V(java/lang/SecurityManager.java:985:0)
	at <java> bind(Ljava/net/SocketAddress;I)V(java/net/ServerSocket.java:387:0)
	at <java> <init>(IILjava/net/InetAddress;)V(java/net/ServerSocket.java:278:0)
	at <java> <init>(I)V(java/net/ServerSocket.java:171:0)
	at <java> getSocket()Ljava/net/ServerSocket;(org/opensearch/identity/shiro/ShiroIdentityPlugin.java:144:0)
	at <java> RootNode for interop message: 'invokeMember'.(Unknown)
	at org.graalvm.polyglot.Value.invokeMember(Value.java:1021)
	at org.opensearch.espresso.sandbox.Sandbox.loadPlugin(Sandbox.java:86)
	at org.opensearch.espresso.sandbox.Sandbox.main(Sandbox.java:22)

Related Issues

Closes #16861

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Other labels Dec 16, 2024
Copy link
Contributor

❌ Gradle check result for a8e52f3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 6201d8c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Comment on lines +63 to +83
.allowHostAccess(HostAccess.NONE)
.allowIO(IOAccess.NONE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while i understand this is POC for the shiro plugin, the sandbox restrictions for a different plugin could differ and we'd need to build and load a different context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we set allowHostClassLoading(false), we'd still be able to load the plugin.

Copy link
Collaborator Author

@reta reta Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these settings are only "applicable" to embedded context, and once external JVM is started, all restrictions are off (this is why we need SM here).

You can easily see the confirmation here, in theory allowIO(IOAccess.NONE) would cut any I/O off but it does not, the SM does (if we comment out SM initialization, the socket will be created):

	at <java> getSocket()Ljava/net/ServerSocket;(org/opensearch/identity/shiro/ShiroIdentityPlugin.java:144:0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while i understand this is POC for the shiro plugin, the sandbox restrictions for a different plugin could differ and we'd need to build and load a different context?

This is out of scope for sandbox but in scope of SM of the spawn JVM process. We apply the same security policy that OpenSearch core does now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha.

We apply the same security policy that OpenSearch core does now.

this also means we cannot apply/use the plugin level security policies?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also means we cannot apply/use the plugin level security policies?

Quite an opposite :-) the OpenSearch core does include all plugin security policies into consideration. So it is kind of bootstrapping OpenSearch core + plugin(s), the policies will be glued together (as it does now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can easily see the confirmation here, in theory allowIO(IOAccess.NONE) would cut any I/O off but it does not, the SM does (if we comment out SM initialization, the socket will be created):

Thanks for pointing that.

@kumargu
Copy link
Contributor

kumargu commented Dec 17, 2024

Thank-you again for putting this up.

Apart from the debugging pain, which i hope would be one-time while setting up a plugin); I don't see a reason why we would not include this as an alternative for SM. Let's see what others feel about it.

This would look much better, allowing plugins to move to JDK-24 with a real look-and-feel of a plugin sandbox env when Graal fully addresses oracle/graal#10239

@reta
Copy link
Collaborator Author

reta commented Dec 17, 2024

Apart from the debugging pain, which i hope would be one-time while setting up a plugin); I don't see a reason why we would not include this as an alternative for SM. Let's see what others feel about it.

I have included it here #16861 but to reiterate, the most difficult issue with such a model is communication between host and spawned JVM/context: it seems not being possible to wire up the services from the host JVM. I will spend more time to explore the limitations here.

public ServerSocket getSocket(Client client) throws IOException {
System.out.println("Client? " + client);
System.out.println("AdminClient? " + client.admin());
client.admin().cluster().prepareState().execute().actionGet();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails with

Exception in thread "main" org.graalvm.polyglot.PolyglotException: java.lang.ClassCastException: Invalid inline context node passed to an inlined field. A receiver of type 'ToReferenceFactory.ToUnknownNodeGen.ForeignWrapperData' was expected but is 'ToReferenceFactory.ToUnknownNodeGen'. Did you pass the wrong node to an execute method of an inlined cached node?
	at com.oracle.truffle.api.dsl.InlineSupport$UnsafeField.invalidReceiver(InlineSupport.java:1253)
	at com.oracle.truffle.api.dsl.InlineSupport$UnsafeField.nullError(InlineSupport.java:1239)
	at com.oracle.truffle.api.dsl.InlineSupport$UnsafeField.resolveReceiverSlow(InlineSupport.java:1203)
	at com.oracle.truffle.api.dsl.InlineSupport$UnsafeField.resolveReceiver(InlineSupport.java:1175)
	at com.oracle.truffle.api.dsl.InlineSupport$UnsafeField.getInt(InlineSupport.java:1275)
	at com.oracle.truffle.api.dsl.InlineSupport$StateField.get(InlineSupport.java:464)
	at com.oracle.truffle.api.profiles.InlinedBranchProfile.enter(InlinedBranchProfile.java:100)
	at com.oracle.truffle.espresso.nodes.interop.ToReference$ToUnknown.doForeignWrapper(ToReference.java:2474)
	at com.oracle.truffle.espresso.nodes.interop.ToReferenceFactory$ToUnknownNodeGen.executeAndSpecialize(ToReferenceFactory.java:10328)
	at com.oracle.truffle.espresso.nodes.interop.ToReferenceFactory$ToUnknownNodeGen.execute(ToReferenceFactory.java:10262)
	at com.oracle.truffle.espresso.nodes.interop.ToReference$DynamicToReference.doCached(ToReference.java:321)
	at com.oracle.truffle.espresso.nodes.interop.ToReferenceFactory$DynamicToReferenceNodeGen.executeAndSpecialize(ToReferenceFactory.java:175)
	at com.oracle.truffle.espresso.nodes.interop.ToReferenceFactory$DynamicToReferenceNodeGen.execute(ToReferenceFactory.java:141)
	at com.oracle.truffle.espresso.substitutions.Target_com_oracle_truffle_espresso_polyglot_Interop$InvokeMemberWithCast.doCached(Target_com_oracle_truffle_espresso_polyglot_Interop.java:2104)
	at com.oracle.truffle.espresso.substitutions.Target_com_oracle_truffle_espresso_polyglot_InteropFactory$InvokeMemberWithCastNodeGen.execute(Target_com_oracle_truffle_espresso_polyglot_InteropFactory.java:4966)
	at com.oracle.truffle.espresso.substitutions.Target_com_oracle_truffle_espresso_polyglot_Interop_InvokeMemberWithCast__LLLL.invoke(Target_com_oracle_truffle_espresso_polyglot_Interop_InvokeMemberWithCast__LLLL.java:74)
	at com.oracle.truffle.espresso.nodes.IntrinsicSubstitutorNode.execute(IntrinsicSubstitutorNode.java:74)
	at com.oracle.truffle.espresso.nodes.EspressoRootNode$Default.execute(EspressoRootNode.java:403)
	at <java> invokeMemberWithCast(Ljava/lang/Class;Ljava/lang/Object;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/Object;(Unknown)
	at <java> cluster()Lorg/opensearch/client/ClusterAdminClient;(Unknown)

Should be fixed in 24.2.0

Copy link
Contributor

❌ Gradle check result for ed8cd55: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kumargu
Copy link
Contributor

kumargu commented Dec 19, 2024

Apart from the debugging pain, which i hope would be one-time while setting up a plugin); I don't see a reason why we would not include this as an alternative for SM. Let's see what others feel about it.

I have included it here #16861 but to reiterate, the most difficult issue with such a model is communication between host and spawned JVM/context: it seems not being possible to wire up the services from the host JVM. I will spend more time to explore the limitations here.

Asking to keep myself upto-date -- I guess you have figured it out usingjava.PolyglotInterfaceMappings?. Just that its broken in 24.1.1?

@reta
Copy link
Collaborator Author

reta commented Dec 19, 2024

Asking to keep myself upto-date -- I guess you have figured it out usingjava.PolyglotInterfaceMappings?. Just that its broken in 24.1.1?

Yes, sadly it is. And AFAIK, the earliest release with the fix is in March (as per Slack thread response), this is a blocker for us at the moment since we cannot support host <-> guest exchange :(

@kumargu
Copy link
Contributor

kumargu commented Dec 23, 2024

(posting for visibility)
we requested GraalVM to-try-make the fix available via a patch in their upcoming Jan-25 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Other
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[POC] [Security Manager Replacement] GraalVM sandboxing
2 participants