Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade ES client to 8.2.0 #1510

Closed

Conversation

michaelbaamonde
Copy link
Contributor

@michaelbaamonde michaelbaamonde commented Jun 3, 2022

This PR upgrades Rally's Elasticsearch Python client library dependency to v8.2.0 from v7.14.0. The changes are pretty substantial. I've tried to make the commits reasonably-sized and atomic, so I suggest reviewing by commit. Nevertheless, some (especially those touching unit tests) are large and mostly consist of migrating from positional arguments to keyword arguments.

The most important change is contained in e7d83cc. The Python client has moved all transport logic into a separate library, https://github.com/elastic/elastic-transport-python. Low-level transport-related details have changed substantially, and our customizations within esrally/client.py and esrally/async_connection.py have had to change accordingly. It's a large diff, but it should all be transparent to the user.

I've also had to override the client's default behavior for verifying that a given server is authentically Elasticsearch. See the commit message (and comments in the code) for details, but the gist is that the client as of v8.0.0 does this check exclusively via a header that's only available in ES 7.14.0+. Previous client versions fell back to other mechanisms for verifying authenticity absent this header. Since Rally must support all versions of ES >= 6.8.0, I've resurrected this prior behavior.

Otherwise, most changes are fairly mechanical.

A few notes:

  • I've tested this with every maintained track that I have access to. Even so, potential for breakage is very high, and I intend to do more. Please feel free to do the same!
  • I've issued a rally-tracks PR and an eventdata PR to update some runners that currently fail with this PR, and will cause integration tests to fail.
  • Same as above, but for a few internal tracks.
  • There are some TODOs that you'll see in comments, particularly around exception handling, which has changed considerably. What I have right now works, but I don't consider it robust enough yet. I'm actively working on this. Any thoughts on this in particular would be helpful.
  • The EsClientFactory class has gotten a bit too big, so I am happy to reorganize that code.
  • There are a handful of additional tests that I have in mind that would be useful to add, but could probably be done in follow-up PRs. But if anything seems glaringly missing, please let me know.

Closes #1350.

Mike Baamonde added 17 commits June 3, 2022 12:56
Also add the elastic-transport library as a dependency.
…lity.

As of 8.0.0, transport logic has been moved out of `elasticsearch-py` and into a
standalone library, `elastic-transport-python`. This includes substantial changes
to the lower-level APIs for configuring client objects. This commit adjusts
accordingly, but preserves Rally's previous behavior.

This commit also reproduces the product verification behavior of `v7.14.0` of the
client:
https://github.com/elastic/elasticsearch-py/blob/v7.14.0/elasticsearch/transport.py#L606

As of `v8.0.0`, the client determines whether the server is Elasticsearch by checking
whether HTTP responses contain the `X-elastic-product` header. If they do not, it raises
an `UnsupportedProductException`. This header was only introduced in Elasticsearch `7.14.0`,
however, so the client will consider any version of ES prior to `7.14.0` unsupported due to
responses not including it.

Because Rally needs to support versions of ES >= 6.8.0, we resurrect this earlier
logic for determining the authenticity of the server, which does not rely exclusively
on this header.
The client's `transport` object will no longer have a `hosts` attribute,
but will had a `node_pool` attribute.

Exceptions are also different-- a TransportError will not have a `status_code`
attribute, but should contain a status code as its `message`.
The ES client now requires kwargs, so we ensure that we do not use any
positional arguments.

The client also now distinguishes between a `TransportError` and an `ApiError`,
and `ElasticsearchException` has been removed. We adjust accordingly, albeit
with a TODO to revisit this.
This migrates to the new `TransportError`/`ApiError` approach, but it's a work
in progress and needs another pass before it's considered robust.
This is the new, preferred way of doing this, and these requests fail otherwise.
We should genericize this behavior across runners.
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an initial reviewing consisting of random comments while reading the code. This will take a while to review, I expect.

By the way, do you consider testing Elasticsearch 8.2 in IT tests a separate issue?

def __init__(self, config):
super().__init__(config)

self._loop = asyncio.get_running_loop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please assign self._loop to None here and only call asyncio.get_running_loop in _create_aiohttp_session like elastic-transport does?

https://sethmlarson.dev/blog/flask-async-views-and-async-globals explains why it's important to wait before calling this function.

Comment on lines +205 to +206
# May need to be just ssl=self._ssl_context
ssl_context=self._ssl_context,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please do that. Using ssl= is the preferred way since 3.0 (aio-libs/aiohttp#2626) and ssl_context= goes away in the next version, 4.0 (aio-libs/aiohttp#3548).

super().__init__(config)

self._loop = asyncio.get_running_loop()
self._limit = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not be setting self._limit as it's internal to the Python client. In fact elastic-transport switched:

  • from setting aiohttp's TCPConnector limit to setting limit_per_host and
  • from self._limit to self._connection_per_node.

So the code below should read limit_per_host=self._connections_per_node instead of limit=self._limit

Comment on lines +1923 to +1924
# elif e.info:
# request_meta_data["error-description"] = f"{e.error} ({e.info})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove code instead of commenting it out

# Because Rally needs to support versions of ES >= 6.8.0, we resurrect the previous
# logic for determining the authenticity of the server, which does not rely exclusively
# on this header.
class _ProductChecker:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move all of this (_ProductChecker and RallySyncElasticsearch) to a new file called sync_connection.py? Maybe sync_connection.py and async_connection.py are not good name, but I would like to move all subclasses of the Python client to another file given esrally/client.py is so big already.

I'm also wondering if there is a way to avoid duplicating all that code, but I guess not if you included it!

@@ -1894,6 +1904,7 @@ async def execute_single(runner, es, params, on_error):
total_ops = 1
total_ops_unit = "ops"
request_meta_data = {"success": True}
# TODO: This all needs to be refactored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify what needs to be refactored?

Comment on lines -458 to -467
class TestAsyncConnection:
@pytest.mark.asyncio
async def test_enable_cleanup_close(self):
connection = AIOHttpConnection()
# pylint: disable=protected-access
assert connection._enable_cleanup_closed is True

connection = AIOHttpConnection(enable_cleanup_closed=False)
# pylint: disable=protected-access
assert connection._enable_cleanup_closed is False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

Comment on lines +220 to +222
# We call get() here instead of pop() in order to pass verify_certs through as a kwarg
# to the elasticsearch.Elasticsearch constructor. Setting the ssl_context's verify_mode to
# ssl.CERT_NONE is insufficient with version 8.0+ of elasticsearch-py.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm sure this was painful to debug, I'm not sure the comment really helps. My heuristic in cases like this is: would I be writing this comment if this was new code? Probably not.

Comment on lines +1540 to +1544
await es.perform_request(
method="PUT",
path=f"/_xpack/ml/datafeeds/{datafeed_id}",
body=body,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please mention that this is still needed for Elasticsearch 6.8? That when we drop support for it we'll remove it.

And somehow mention that this applies to all other datafeeds classes.

@michaelbaamonde
Copy link
Contributor Author

I'm going to break this up into a sequence of smaller PRs, so closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade Elasticsearch client to 8.0.0
2 participants