Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-31498 Kafka shared library not constructing/destructing properly #18438

Conversation

dcamper
Copy link
Contributor

@dcamper dcamper commented Mar 21, 2024

Use Singleton pattern instead of std::once.

Obscure problems arose with the destruction of a static std::once, when the plugin was loaded and unloaded multiple times in a single job.

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Manual testing with a local Kafka instance, publishing and consuming messages.

@dcamper dcamper requested a review from ghalliday March 21, 2024 14:24
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcamper I don't think this is quite right.

KafkaPlugin::publisherCacheExpirer = new KafkaPlugin::PublisherCacheExpirerObj;
KafkaPlugin::publisherCacheExpirer->start();
queryPublisherCache();
queryPublisherCacheExpirer().start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition - two threads could get here at the same time, and both pass the !isAlive() test.
I think the correct solution (using an auto pattern jake uses) is

 static PublisherCacheExpirerObj & queryPublisherCache()
 {
     auto initFunction = [] ()
    {
        PublisherCacheExpirerObj * publisher = new PublisherCacheExpirerObj;
        KafkaPlugin::publisherCacheExpirer = new KafkaPlugin::PublisherCacheExpirerObj;
        KafkaPlugin::publisherCacheExpirer->start();
        return publisher;
    };
    return *publisherCacheExpirer.query(initFunction);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can delete the calls to setupPublisherCache() since queryPublisherCache() will handle it for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the circular dependencies between PublisherCacheObj and PublisherCacheExpirerObj I opted to refactor: The expirer class (and its background thread) are now gone. Expiration is done after a new publisher object is created. This simplifies things a great deal and probably increases system performance slightly (one less thread to manage).

delete(KafkaPlugin::publisherCache);
KafkaPlugin::publisherCache = NULL;
}
KafkaPlugin::publisherCacheExpirer.destroy();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to call stop - simplest is in the destructor since nothing is derived from it.

@dcamper dcamper requested a review from ghalliday March 21, 2024 19:21
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks reasonable. One question.

Since the expiry is now handled by the main calling thread, if a publisher is being expired the call to get a publisher could block for pollTimeout before it returns (while it waits to join the thread). Is that going to cause any issues?

@dcamper
Copy link
Contributor Author

dcamper commented Mar 22, 2024

@ghalliday With the current value of poll timeout (1s) it should not be a problem. In practice, a given job will publish to only one Kafka topic on a single broker (and therefore there will be only one publisher object).

@dcamper
Copy link
Contributor Author

dcamper commented Mar 22, 2024

@ghalliday I am retargeting this PR to 9.4.x, as soon as my internal testing succeeds. I will need a final once-over review after I push those changes.

@dcamper dcamper force-pushed the hpcc-31498-kafka-plugin-singleton branch from 0ee7612 to ff743c8 Compare March 22, 2024 13:02
@dcamper dcamper changed the base branch from candidate-9.6.x to candidate-9.4.x March 22, 2024 13:03
@dcamper dcamper requested a review from ghalliday March 22, 2024 13:06
@dcamper
Copy link
Contributor Author

dcamper commented Mar 22, 2024

Retargeted to 9.4.x. Also removed an unnecessary check from within expire() method.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving assume testing goes as you expect.

@dcamper dcamper force-pushed the hpcc-31498-kafka-plugin-singleton branch from 90c27c0 to a601526 Compare March 22, 2024 13:33
@dcamper
Copy link
Contributor Author

dcamper commented Mar 22, 2024

Squashed in preparation for merge, but don't merge yet. I am awaiting customer verification.

@ghalliday
Copy link
Member

@dcamper can you also change the commit message so that it matches the normal convention. HPCC-XXXXX ...

@dcamper dcamper changed the title Kafka shared library not constructing/destructing properly HPCC-31498 Kafka shared library not constructing/destructing properly Mar 28, 2024
@dcamper dcamper force-pushed the hpcc-31498-kafka-plugin-singleton branch from a601526 to 11750e0 Compare March 28, 2024 20:50
@dcamper
Copy link
Contributor Author

dcamper commented Mar 28, 2024

@ghalliday My apologies! Commit message amended.

@dcamper
Copy link
Contributor Author

dcamper commented Apr 8, 2024

@ghalliday Please merge. Thanks!

@ghalliday ghalliday closed this Apr 9, 2024
@ghalliday ghalliday reopened this Apr 9, 2024
Copy link

github-actions bot commented Apr 9, 2024

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31498

Jirabot Action Result:
Workflow Transition: Merge Pending

@ghalliday ghalliday merged commit 188c6bd into hpcc-systems:candidate-9.4.x Apr 9, 2024
98 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants