-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Arguments for Distributed Mode in Qualification Tool CLI #1429
base: spark-rapids-tools-distributed-base
Are you sure you want to change the base?
Add Arguments for Distributed Mode in Qualification Tool CLI #1429
Conversation
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
@@ -608,7 +609,7 @@ def populate_dependency_list() -> List[RuntimeDependency]: | |||
# check if the dependencies is defined in a config file | |||
config_obj = self.get_tools_config_obj() | |||
if config_obj is not None: | |||
if config_obj.runtime.dependencies: | |||
if config_obj.runtime and config_obj.runtime.dependencies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since runtime
field in the tools config has been made optional, we need to check if config_obj.runtime
is not None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regenerate the specification file for --tools_config_file
as we are introducing a new property distributed_tools_config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sample config file that defines spark properties for the distributed mode.
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
user_tools/src/spark_rapids_tools/tools/qualification_stats_report.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @parthosa! LGTM, just a few quick questions.
Signed-off-by: Partho Sarthi <[email protected]>
Fixes #1430.
This PR adds the initial changes needed in CLI to support distributed execution in the Qualification Tool CLI. It adds arguments to enable distributed mode and sets the stage for future implementation PRs.
Note:
Changes Overview
RapidsJob
: Introduced two subclasses—RapidsDistributedJob
andRapidsLocalJob
and a concrete class for theOnPrem
platform.JarCmdArgs
class to encapsulate all arguments needed to construct the JAR command.DistributedToolsConfig
class, allowing configurations for distributed tools (like Spark properties) to be specified via the existing--tools_config_file
option.CMD:
Sample Config File:
Details:
user_tools/src/spark_rapids_pytools/cloud_api/onprem.py
: Added a new classOnPremDistributedRapidsJob
and a methodcreate_distributed_submission_job
to support distributed RAPIDS jobs. [1] [2]user_tools/src/spark_rapids_pytools/rapids/rapids_job.py
: IntroducedRapidsDistributedJob
class and updated methods to handle distributed tool configurations. [1] [2] [3] [4]user_tools/src/spark_rapids_pytools/rapids/rapids_tool.py
: Added methods to get distributed tools configurations and submit distributed jobs. [1] [2]Enhancements to argument processing:
user_tools/src/spark_rapids_pytools/rapids/qualification.py
: Added methods to process distributed tools arguments. [1] [2]user_tools/src/spark_rapids_tools/cmdli/argprocessor.py
: UpdatedQualifyUserArgModel
andbuild_tools_args
to includedistributed_tools_enabled
. [1] [2]Platform class updates:
user_tools/src/spark_rapids_pytools/cloud_api/databricks_aws.py
,databricks_azure.py
,dataproc.py
,dataproc_gke.py
,emr.py
: Disabled pylint warnings for abstract methods. [1] [2] [3] [4] [5]Other improvements:
user_tools/src/spark_rapids_pytools/rapids/qualification.py
: Added a check to ensure the DataFrame is not empty before accessing it.user_tools/src/spark_rapids_tools/cmdli/tools_cli.py
: Added a new parameterdistributed
to thequalification
function.