Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-31160 Fix slow CSV read of super with HEADINGs. #18300

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 19 additions & 9 deletions thorlcr/activities/csvread/thcsvrslave.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -249,17 +249,27 @@ class CCsvReadSlaveActivity : public CDiskReadSlaveActivityBase
if (sentHeaderLines->testSet(subFile))
return;

/* Before we can send state of headerLines of subfiles,
* need to have received any updates from previous worker.
* The previous worker will have sent updates as it progressed,
* and info. re. all files it is not dealing with (and all remaining if stopped) */
while (true)
/* NB: we are here because this worker has consumed all remaining header lines for this subfile.
* It must now inform the next worker so it can make progress on this subfile asap.
*
* The other subfile header line info. will be communicated as this worker makes progress
* through the subfiles, or when it stops (see sendRemainingHeaderLines()).
*/

// JCSMORE: only left in for testing, should be removed (see HPCC-31160)
if (getOptBool("csvWaitAllSubs"))
{
unsigned which = gotHeaderLines->scan(0, false);
if (which == subFiles) // all received
break;
getHeaderLines(which);
// This causes this worker to block until the previous worker has processed the headerlines for all subfiles.
// In effect, causing workers to process the csv read sequentially, massively slowing down throughput.
while (true)
{
unsigned which = gotHeaderLines->scan(0, false);
if (which >= subFiles) // all received
break;
getHeaderLines(which);
}
}

bool someLeft=false;
unsigned hL=0;
for (; hL<subFiles; hL++)
Expand Down
Loading