Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally use generated api_key auth to ES #1098

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 25 additions & 3 deletions esrally/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import time

import certifi
import elasticsearch
import urllib3

from esrally import exceptions, doc_link
Expand Down Expand Up @@ -127,18 +128,39 @@ def _is_set(self, client_opts, k):
except KeyError:
return False

def create(self):
import elasticsearch
def _postprocess_client_options(self):
"""
This method is used to do any non init processing of the client options, prior to creating a client
"""
if self._is_set(self.client_options, "use_api_key"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also enter the block if the user specified use_api_key:falseon the client options. I noticed that we fall into this trap in other places in this class as well and I propose we change this line to:

if self.client_options.get("use_api_key", False):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That method is slightly misleading. It does not check if its set, it just removes the possibility of a key error. It returns the value, if present. Adding a test like so confirms this (and passes).

    def test_is_set(self):
        hosts = [{"host": "127.0.0.1", "port": 9200}]
        client_options = {}
        # make a copy so we can verify later that the factory did not modify it
        original_client_options = dict(client_options)

        f = client.EsClientFactory(hosts, client_options)

        opts = {"foo": True, "bar": False}
        assert f._is_set(opts, "foo") is True
        assert f._is_set(opts, "bar") is False
        assert f._is_set(opts, "baz") is False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the semantics of the predicate is_set are misleading and maybe we should change it. I still propose we implement this line here differently:

if self.client_options.get("use_api_key", False):

This ensures correct behavior depending on the value of use_api_key.

self._generate_api_key()

def _generate_api_key(self):
with self._create_sync_client() as instance:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya i figured you would like that ;)

# an api_key is generated per client, which happens after an initial handshake using an existing auth scheme
# once this key is generated, a new client is created using the api_key client option and http_auth is removed
# the reason the options were not removed in place was because of the logic to coalesce the key tuple into a
# header that the client could use.
api_key_response = instance.security.create_api_key({"name": "rally-api-key"})
self.client_options.pop("http_auth", None)
self.client_options["api_key"] = (api_key_response["id"], api_key_response["api_key"])

def _create_sync_client(self):
return elasticsearch.Elasticsearch(hosts=self.hosts, ssl_context=self.ssl_context, **self.client_options)

def create(self):
self._postprocess_client_options()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two issues with this that I missed when first eyeballing your draft PR:

  1. This circumvents the logic to wait for the REST layer to be available. This will work if you spin up Elasticsearch yourself because by the time you start Rally, Elasticsearch is likely up, but it won't work when Rally spins up the cluster. The reason is that we first create the client, then use this client to wait for the REST API layer and only then continue with the benchmark (see Driver#prepare_benchmark() for details). Reproduction scenario: esrally --distribution-version=7.9.3 --car=defaults,trial-license,x-pack-security --client-options="use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password',use_api_key:true"
  2. This will only create one API key per process but not per simulated client. Say, you start a bulk indexing task on an 8 core machine with 256 clients. Rally will start 8 load generator processes (because of the 8 cores) and each of them will retrieve an API key. Each of the load generator processes will simulate 256/8 = 32 clients and each of these clients will use the same API key. However, a user will likely expect that each client uses their own API key (i.e. retrieve 256 API keys).

Given these two issues I sense that we need a different approach. One option would be to insert a dedicated task at the beginning of a schedule but I sense that's the wrong approach either because the track should be agnostic to how clients authenticate. Maybe let's discuss this on a high-bandwidth medium.

return self._create_sync_client()

def create_async(self):
import elasticsearch
import esrally.async_connection
import io
import aiohttp

from elasticsearch.serializer import JSONSerializer

self._postprocess_client_options()

class LazyJSONSerializer(JSONSerializer):
def loads(self, s):
meta = RallyAsyncElasticsearch.request_context.get()
Expand Down
18 changes: 18 additions & 0 deletions tests/client_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,24 @@ def test_create_https_connection_unverified_certificate_present_client_certifica

self.assertDictEqual(original_client_options, client_options)

@mock.patch("elasticsearch.Elasticsearch")
def test_use_api_keys(self, _):
hosts = [{"host": "127.0.0.1", "port": 9200}]
client_options = {"use_api_key": True, "http_auth": ("a", "b")}

f = client.EsClientFactory(hosts, client_options)
self.assertIsNotNone(f.client_options["http_auth"])
f.create()
self.assertIsNone(f.client_options.get("http_auth", None))
self.assertIsNotNone(f.client_options["api_key"])

# now assert that not having a http_auth client option does not cause issue
del f.client_options["api_key"]
self.assertIsNone(f.client_options.get("api_key", None))
f.create()
self.assertIsNone(f.client_options.get("http_auth", None))
self.assertIsNotNone(f.client_options["api_key"])


class RestLayerTests(TestCase):
@mock.patch("elasticsearch.Elasticsearch")
Expand Down