Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-31990 Add timeout to DNS lookups for soapcalls #18755

Closed
wants to merge 2 commits into from

Conversation

mckellyln
Copy link
Contributor

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31990

Jirabot Action Result:
Workflow Transition: Merge Pending
Updated PR
Assigning user: [email protected]

@mckellyln mckellyln force-pushed the dns_timeout branch 10 times, most recently from 0533bfb to a038362 Compare June 13, 2024 04:04
@mckellyln mckellyln requested a review from ghalliday June 13, 2024 04:04
@mckellyln
Copy link
Contributor Author

@ghalliday initial thoughts ?

system/jlib/jsocket.cpp Outdated Show resolved Hide resolved
@mckellyln
Copy link
Contributor Author

I think I'll change to using a threadpool and then no need for a reaper thread.

@mckellyln mckellyln force-pushed the dns_timeout branch 2 times, most recently from 8a006f8 to 9550d0f Compare June 17, 2024 12:53
return true;
}
{
CriticalBlock block(queryDNSCS);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could pthread_cancel() here to help stop thread asap but still would need to check for it / cleanup in the reaper thread.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mckellyln in general looks good. A few minor suggestions to clean up code and questions about timeouts. I didn't see any logic errors though.


class GetAddrInfoThread : public Thread
{
char name[256];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better to use a str::string or a StingAttr?

{
char name[256];
Semaphore semait;
std::atomic<bool> started;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly better to use initializers {false} rather than assign in the constructor

Semaphore semait;
std::atomic<bool> started;
std::atomic<bool> ended;
unsigned netaddr[4];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: an initializer = { 0, 0, 0, 0 } probably better than the memset


bool thrdHasEnded()
{
if (started)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: What does the extra check on started provide?

bool waitms(unsigned timeoutms, unsigned *resAddr)
{
bool ret = semait.wait(timeoutms);
if (ret && ended)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant? I think ended must be true if the semaphore has signalled.

std::atomic<bool> stopped;
Semaphore sem;

AddrInfoReaperThread() : stopped(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaner with false as the default? (and use an initializer)

join();
{
CriticalBlock block(queryDNSCS);
gaPtrList.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could mean there are threads still running if they have not yet failed. That could potentially cause a core at closedown, but I suspect the chance is very small.

while (iter != end)
{
GetAddrInfoThread *pItem = *iter;
if ( (pItem->thrdHasEnded()) && (pItem->join(20)) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance this could take a while and mean the crit sec is held for longer than we would hope. If so, they might need copying to another list, and joining/releasing outside of the critical section.

}
}
}
if (sem.wait(10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 seems quite low - otherwise it will be waking up 100 times a second. 1000 might be more appropriate.


static Owned<AddrInfoReaperThread> addrinforeaperthrd;

static bool queryDNSTimeout()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could combine with queryKeepAlive to initialise all socket related options.

@mckellyln
Copy link
Contributor Author

switching to a thread pool

@mckellyln mckellyln force-pushed the dns_timeout branch 2 times, most recently from d5bb43e to b3a1885 Compare July 18, 2024 13:17
@mckellyln
Copy link
Contributor Author

Closing this PR and will issue a new PR using a threadpool.

@mckellyln mckellyln closed this Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants