Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-31968 Increase ECLWatch upload buffer size to 1MB #18721

Merged

Conversation

asselitx
Copy link
Contributor

@asselitx asselitx commented May 31, 2024

Get a quick reasonable fix deployed. Former 1K size likely cause of slow uploads to cloud, if not increased cost in some cases. Subsequent ticket will use configured preferred size per landing zone.

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Tested locally with instrumentation to confirm upload completes in fewer read/write cycles

@asselitx asselitx requested a review from ghalliday May 31, 2024 17:03
@asselitx
Copy link
Contributor Author

@ghalliday let me know if someone else is better suited to review. Tim is out or I would have asked him.

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31968

Jirabot Action Result:
Workflow Transition: Merge Pending
Updated PR

@asselitx asselitx force-pushed the slow-1mb-hpcc-31968 branch from fdcb624 to f33a58d Compare May 31, 2024 19:36
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asselitx see comments and suggestion. Please ask me if you have any questions.

@@ -2123,8 +2123,8 @@ int CHttpRequest::processHeaders(IMultiException *me)

bool CHttpRequest::readContentToBuffer(MemoryBuffer& buffer, __int64& bytesNotRead)
{
char buf[1024 + 1];
__int64 buflen = 1024;
char buf[1048576 + 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating 1MB on the stack could cause problems.

The function also has a couple of other problems:

  • It is unnecessarily memcpy-ing the data - which adds up if the file is large.
  • It is unnecessarily adding a null terminator onto the string that was read (which makes the code confusing).

Better is something like:

    constexpr size32_t readChunkSize = 0x100000;
    size32_t sizeToRead = bytesNotRead > readChunkSize ? readChunkSize: (size32_t)readChunkSize;
    size32_t prevLen = buffer.length();
    char * target = (char *)buffer.reserve(sizeToRead);
    int readlen = m_bufferedsocket->read(target, sizeToRead);
    if(readlen <= 0)
    {
        if(readlen < 0)
            DBGLOG("Failed to read from socket");
        buffer.setLength(prevLen);
        return false;
    }

    buffer.setLength(prevLen + readlen);
    bytesNotRead -= readlen;
    return true;

(Untested). It reads the data directly into the buffer rather than copying it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, it is a much better solution. I'd completely missed the consideration of stack size and the reserve on the buffer is slick. I'm implementing this and testing.

In HttpTransport, update readContentToBuffer function to read up to 1MB
chunks directly into the MemoryBuffer rather than into a temporary stack
buffer.

Former 1K size likely cause of slow uploads to cloud, if not increased cost
in some cases. Subsequent ticket will use configured preferred size per
landing zone.

Signed-off-by: Terrence Asselin <[email protected]>
@asselitx asselitx force-pushed the slow-1mb-hpcc-31968 branch from f33a58d to 2f399ca Compare June 5, 2024 21:38
@asselitx asselitx requested a review from ghalliday June 5, 2024 21:38
size32_t sizeToRead = bytesNotRead > readChunkSize ? readChunkSize: (size32_t)bytesNotRead;
size32_t prevLen = buffer.length();

// BufferedSocket::read buffer must be at least one larger than its maxlen argument
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted (and that is a terrible interface!)

@ghalliday ghalliday merged commit fea585e into hpcc-systems:candidate-9.6.x Jun 7, 2024
51 of 52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants