-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort Qual execs report by sqlId and nodeId #1436
Conversation
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @amahussein !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amahussein. A minor question.
StringUtils.reformatCSVString(info.stages.mkString(":")), | ||
childrenExecsStr, | ||
nodeIdsStr, | ||
if (info.shouldRemove) booleanTrue else booleanFalse, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why do we require the booleanTrue
and booleanFalse
variables instead of directly using info.shouldRemove.toString
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question @parthosa !
I assume that info.shouldRemove.toString
is going to create a new thread each time it is called. We can actually test that by checking the address of the string returned from the call in case Scala is optimized to return a string from an internal pool.
Using the booleanTrue allocates the object only once and shares it with all other records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringUtils.reformatCSVString(info.stages.mkString(":")), | ||
childrenExecsStr, | ||
nodeIdsStr, | ||
if (info.shouldRemove) booleanTrue else booleanFalse, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question @parthosa !
I assume that info.shouldRemove.toString
is going to create a new thread each time it is called. We can actually test that by checking the address of the string returned from the call in case Scala is optimized to return a string from an internal pool.
Using the booleanTrue allocates the object only once and shares it with all other records.
Fixes #1435
This pull request is to make the execs report sorted by SqlID and nodeID.
Overall, it is more readable and easier to troubleshoot the file by looking at the rows grouped by the SQLID next to each other.
Also, this implementation is more optimized because it avoids creating a list of tuples for each row. Instead, it create a string sequence from the execInfo and convert it to string.
This pull request includes several changes to the
QualOutputWriter
class and related methods to simplify and improve the code for writing execution reports. The most important changes include removing redundant methods, simplifying method signatures, and refactoring the logic for constructing CSV rows.Simplification and refactoring:
core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/QualOutputWriter.scala
: Removed theconstructExecInfoBuffer
andconstructExecsInfo
methods, and refactored the logic for constructing CSV rows directly within thewriteExecReport
method. This change simplifies the code by eliminating unnecessary intermediate methods. [1] [2]core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/QualOutputWriter.scala
: Simplified thewriteExecReport
method signature by removing theorder
parameter and updating the method implementation to directly construct and write CSV rows.Method signature updates:
core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/Qualification.scala
: Updated the call towriteExecReport
in theQualification
class to match the new method signature without theorder
parameter.