Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-32734 Add ElasticSearch metric sink configuration #19247

Merged
merged 1 commit into from
Nov 21, 2024

Conversation

kenrowland
Copy link
Contributor

@kenrowland kenrowland commented Oct 29, 2024

Added configuration options for the ElasticSearch sink Updated metrics readme with information on adding the sink to the cluster Added example ElasticSearch configuraion yaml file

Signed-Off-By: Kenneth Rowland [email protected]

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@kenrowland kenrowland requested a review from rpastrana October 29, 2024 19:23
Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32734

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

Copy link
Member

@rpastrana rpastrana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland looks good, but I do have some comments for you to ponder. They're all meant to be constructive, let me know if should clarify any points. Thanks.

data types to be stored in the index in their native types. Without dynamic mapping, the ElasticSearch
default mapping does not properly map value to unsigned 64-bit integers.

To create an index with dynamic mapping, use the following object when creating the index:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this write up seems very useful.
Is the goal still to handle the creation of the indices programmatically in later versions of the sink?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently no plans to create the index from the sink. If that was something we added, there are several considerations that must be taken into account. These are

  • If the cluster is using the same index for all metric reporting, creating and configuring the index must be coordinated somehow. If each component creates their own, there still needs to be coordination.
  • The sink would also need to add mapping templates in order to map values to the correct type. This could still use wildcard dynamic templates, or could be specific to the metrics registered for each metric framework instance
  • If the sink creates the index, we need to worry about life cycle management.

I felt like just telling the sink what index to use allows the policy of the ElasticSearch instance to remain there instead of pushing it down to the sink.

If we decide to push that responsibility to the sink, we can add it as an improvement addressing the concerns from above

<Environment>
<Software>
<metrics name="mymetricsconfig">
<sinks name="elastic" type="elastic">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial but changing the name to 'myelasticsink' would drive to point further and would follow the convention from the metrics name attribute above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. I'll change that.

<Software>
<metrics name="mymetricsconfig">
<sinks name="elastic" type="elastic">
<settings period="30"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge of adding the configuration before the logic is higher potential for downstream restructure of configuration which can be very disruptive to end-users.

It's obviously hard to know what we don't know we'll need to support in later versions, but let's scrutinize each element of your config layout more than usual...
for instance, indexName, if we're planning on expanding the index creation, we might need a pattern rather than a name. with that in mind it might be worth supporting a structured element for the index, for now the only active attribute for the index element could be "name", later adding more attributes under the index element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index config changed accordingly to allow sub items.

# type - sink type (must be elastic for ElasticSearch support)
# name - name for the sink instance
# settings.elasticHost - url for the ElasticSearch instance
# settings.indexName - ElasticSearch index name where metrics are indexed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to illustrate what I meant above, this could be settings.index.name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable. Change made. It also would allow keeping the index configuration separate in case we decide to push more policy into the sink.

@@ -45,11 +45,26 @@ ElasticMetricSink::ElasticMetricSink(const char *name, const IPropertyTree *pSet
ignoreZeroMetrics = pSettingsTree->getPropBool("@ignoreZeroMetrics", true);
pSettingsTree->getProp("@elasticHost", elasticHost);
pSettingsTree->getProp("@indexName", indexName);

// Initialize standard suffixes
countMetricSuffix.append("_count");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an advantage to use append rather than set? (I'm obviously assuming these suffix strings are empty before line 50)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are empty. Use of append is based on code review comments from a previous PR from others on the team.

- type: elastic
name: elasticsink
settings:
elasticHost: http://localhost:9200,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps too picky, but "Host" implies the dns entry only, no protocol, port, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this I wanted to provide the full string to which the specific endpoint is appended. This makes it a bit easier than having to read multiple configuration values just to put them together internally. Any individual part can be changed in the string as easily as it could be in a separate config value.

However, thinking ahead, I can see where something like the port or other portion of the route could be a cluster wide parameter. For that reason, separating them into separate values makes sense.

Copy link
Contributor Author

@kenrowland kenrowland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pleas see comments

data types to be stored in the index in their native types. Without dynamic mapping, the ElasticSearch
default mapping does not properly map value to unsigned 64-bit integers.

To create an index with dynamic mapping, use the following object when creating the index:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently no plans to create the index from the sink. If that was something we added, there are several considerations that must be taken into account. These are

  • If the cluster is using the same index for all metric reporting, creating and configuring the index must be coordinated somehow. If each component creates their own, there still needs to be coordination.
  • The sink would also need to add mapping templates in order to map values to the correct type. This could still use wildcard dynamic templates, or could be specific to the metrics registered for each metric framework instance
  • If the sink creates the index, we need to worry about life cycle management.

I felt like just telling the sink what index to use allows the policy of the ElasticSearch instance to remain there instead of pushing it down to the sink.

If we decide to push that responsibility to the sink, we can add it as an improvement addressing the concerns from above

<Environment>
<Software>
<metrics name="mymetricsconfig">
<sinks name="elastic" type="elastic">
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. I'll change that.

- type: elastic
name: elasticsink
settings:
elasticHost: http://localhost:9200,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this I wanted to provide the full string to which the specific endpoint is appended. This makes it a bit easier than having to read multiple configuration values just to put them together internally. Any individual part can be changed in the string as easily as it could be in a separate config value.

However, thinking ahead, I can see where something like the port or other portion of the route could be a cluster wide parameter. For that reason, separating them into separate values makes sense.

@@ -45,11 +45,26 @@ ElasticMetricSink::ElasticMetricSink(const char *name, const IPropertyTree *pSet
ignoreZeroMetrics = pSettingsTree->getPropBool("@ignoreZeroMetrics", true);
pSettingsTree->getProp("@elasticHost", elasticHost);
pSettingsTree->getProp("@indexName", indexName);

// Initialize standard suffixes
countMetricSuffix.append("_count");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are empty. Use of append is based on code review comments from a previous PR from others on the team.

# type - sink type (must be elastic for ElasticSearch support)
# name - name for the sink instance
# settings.elasticHost - url for the ElasticSearch instance
# settings.indexName - ElasticSearch index name where metrics are indexed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable. Change made. It also would allow keeping the index configuration separate in case we decide to push more policy into the sink.

Copy link
Member

@rpastrana rpastrana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland a few comments/questions

@@ -0,0 +1,36 @@
#
# Defines a elastic sink for reporting metrics to an ElasticSearch instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial. Should be "Defines an..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


Owned<IPropertyTree> pHostConfigTree = pSettingsTree->getPropTree("host");

pHostConfigTree->getProp("@name", hostName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the exact iptree behavior, but what happens if pSettingsTree does not contain a "host" branch? If we're not guaranteed accessing pHostConfigTree->getProp is safe, let's protect this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed


if (!pHostConfigTree->getProp("@protocol", hostProtocol))
{
hostProtocol.append("https");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a reasonable default, but perhaps not the expected default. If this default is not clearly declared to the admin/user somewhere, let's add something to make sure it is.

}
else
{
WARNLOG("ElasticMetricSink: Host configuration is invalid");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commit doesn't show me how the sink will behave if we fall into this case. I see there's a "configurationValid" variable, but what happens when it's set to false? Does the sink simply refuse to report metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't quite decided how to handle this case yet. Since this is a periodic sink, it may be appropriate to add a boolean return code to the prepareToStartCollecting method. That would prevent the base class from starting the collection thread. Or, maybe it just returns each time the doCollection method is returned. This PR was only concentrating on configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I think the question is, do we want to fail the load process and force admins to fix the misconfiguration immediately, or allow the components to load and report the error in logs. There's arguments for both, but if your sink cannot operate w/ the given misconfiguration we might consider error'ing out at load time.

Copy link
Contributor Author

@kenrowland kenrowland Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am of the belief that we do not fail the load process. We log the appropriate warning and just don't collect metrics. That support should be added when we actually start the collection process.

@kenrowland kenrowland requested a review from rpastrana November 5, 2024 17:22
@kenrowland
Copy link
Contributor Author

@ghalliday Please merge

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, and one problem with the suffixes.

* indexName - The name of the index to which metrics are reported. (required)
* countMetricSuffix - The suffix used to identify count metrics. (default: count)
* gaugeMetricSuffix - The suffix used to identify gauge metrics. (default: gauge)
* histogramMetricSuffix - The suffix used to identify histogram metrics. (default: histogram)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Does the elastic stack support mtls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. HPCC-32968 was opened to add security configuration to the sink. It provides for TLS and basic auth. As time progresses I suspect that we will add additional authentication methods to the sink.

- type: elastic
name: myelasticsink
settings:
countMetricSuffix: count,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the comma here and next line are include by mistake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. removed

Owned<IPropertyTree> pHostConfigTree = pSettingsTree->getPropTree("host");
if (pHostConfigTree)
{
pHostConfigTree->getProp("@name", hostName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor future point: If the tree is guaranteed to stay active then more common is to use queryProp() and avoid cloning the strings. In this case the code is only executed once, so it doesn't matter.

const char * protocol = pHostConfigTree->queryProp("@protocol");
if (!protocol)
   protocol = "https";
   

histogramMetricSuffix.append("histogram");

// See if any are overridden
pSettingsTree->getProp("@countMetricSuffix", countMetricSuffix);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does not work. getProp() appends to the existing value, it does not replace it. I am not sure why this code uses a different pattern from the code above that gets the hostname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

"dynamic_templates": [
{
"hpcc_metric_count_to_unsigned_long": {
"match": "*_count",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause problems if other, non-HPCC, components are publishing values with these suffixes to elastic stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, however only when publishing to the same index with these dynamic mapping settings. The json object is meant to be representative of how you'd add dynamic mapping. I will update the description to mention that _count can be set to what makes sense for the index taking any additional mappings into account.

configurationValid = !elasticHostUrl.isEmpty() && !indexName.isEmpty();

// Initialize standard suffixes
countMetricSuffix.append("count");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it likely that anyone is going to want to change these? Under what situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HPCC-32909, one of the current sub task Jiras, is to read the current dynamic mapping from the index. That will likely change if these values are read from the configuration or not.

@kenrowland kenrowland requested a review from ghalliday November 18, 2024 16:52
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland please squash

}

if (indexName.isEmpty())
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: We do not normally use {} if an if statement only has a single statement - here and in several other places in this file. It makes it slightly harder to read.

Added configuration options for the ElasticSearch sink
Updated metrics readme with information on adding the sink to the cluster
Added example ElasticSearch configuraion yaml file

Signed-Off-By: Kenneth Rowland [email protected]
@kenrowland
Copy link
Contributor Author

@ghalliday Please merge

@ghalliday
Copy link
Member

@kenrowland please do not rebase when you squash - unless it is necessary. If you do both at the same time it makes it very hard to check if any other change was introduced as part of the squash.

@ghalliday ghalliday merged commit e92a2bc into hpcc-systems:master Nov 21, 2024
52 checks passed
Copy link

Jirabot Action Result:
Added fix version: 9.10.0
Workflow Transition: 'Resolve issue'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants