Releases: NVIDIA/spark-rapids-tools
Releases · NVIDIA/spark-rapids-tools
v24.10.2
Packages
- Maven Release: https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.10.2/
- PyPI Package: https://pypi.org/project/spark-rapids-user-tools/24.10.2/
Changes
User Tools
- Update models for latest tools code (#1448)
- More flexible regexes; fix default split function (#1443)
- Update models for latest code and dataset JSON (#1442)
- Add model for databricks-azure_photon and update combined model (#1427)
- Remove custom-speedup module from user-tools (#1425)
Core
- Count expressions per Exec in SQLPlanParser (#1449)
- Report all operators in the output file (#1444)
- Fix missing exec-to-stageId mapping in Qual tool (#1437)
- [BUG] Fix Profiler tool index out of bound exception when generating diagnostic metrics (#1439)
- Sort Qual execs report by sqlId and nodeId (#1436)
- Include expression parsers for HashAggregate and ObjectHashAggregate (#1432)
- [FEA] Add stage/task level diagnostic output for GPU slowness in Profiler tool (#1375)
- Reduce the log noise caused by core report summary (#1426)
- Trigger GC at the beginning of each benchmark iteration (#1424)
Miscellaneous
v24.10.1
Packages
- Maven Release: https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.10.1/
- PyPI Package: https://pypi.org/project/spark-rapids-user-tools/24.10.1/
Changes
User Tools
- Add qualification support for Photon jobs in the Python Tool (#1409)
- Add qualx support for platform runtime variants (DB AWS) (#1417)
- Update models for latest emr, onprem eventlogs (#1410)
Core
- Adding EMR-specific tunings for shuffle manager and ignoring jar (#1419)
- Changing autotuner memory error to warning in comments (#1418)
- Add sparkRuntime property to capture runtime type in application_information (#1414)
- Refactor Exec Parsers - remove individual parser classes (#1396)
- Remove estimated GPU duration from qualification output (#1412)
v24.10.0
Packages
- Maven Release: https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.10.0/
- PyPI Package: https://pypi.org/project/spark-rapids-user-tools/24.10.0/
Changes
User Tools
- [FEA] Allow users to specify custom Dependency jars (#1395)
- Reduce default memory allocation to the java process (#1407)
- Update error handling in python for parsing cluster information (#1394)
- user-tools should add xms argument to java cmd (#1391)
- Use environment variables to set thresholds in static yaml configurations (#1389)
- Use StorageLib to download dependencies (#1383)
- Remove total core second heuristic and filter apps only in top candidate view (#1376)
- Generate log files for Python Profiling cli (#1366)
- Update models for updated datasets and latest code (#1365)
- Isolate dataset for qualx plugin invocations (#1361)
- [FEA] Add total core seconds into top candidate view (#1342)
- Fix python tool picking up wrong JAR version in Fat wheel mode (#1357)
- [FOLLOWUP-1326] Set Spark version to 3.4.2 by default for onprem environment (#1358)
- Disable
too-many-positional-arguments
in pylintrc (#1353) - Reduce console output tree level, exclude JAR tool output files and remove incorrect logging (#1340)
Core
- Add support for Photon-specific SQL Metrics (#1390)
- Add support for processing Photon event logs in Scala (#1338)
- Add Reflection to support custom Spark Implementation at Runtime (#1362)
- Improve AQE support by capturing SQLPlan versions (#1354)
- Add PartitionFilters and DataFilters to the dataSourceInfo table (#1346)
- Add support to ArrayJoin in Qualification tool (#1345)
Miscellaneous
v24.08.2
Packages
- Maven Release: https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.08.2/
- PyPI Package: https://pypi.org/project/spark-rapids-user-tools/24.08.2/
Changes
User Tools
- Add end-to-end behavioural tests for the python CLI (#1313)
- Add documentation for qualx plugins (#1337)
- Allow spark dependency to be configured dynamically (#1326)
- Follow-up 1318: Fix QualX fallback with default speedup and duration columns (#1330)
- Updated models for EMR NDS-H dataset (#1331)
Core
- [FEA] Add total core seconds in Qualification core tool output (#1320)
- Add support to MaxBy and MinBy in Qualification tool (#1335)
- Add safeguards to prevent older attempts from generating metrics output in Scala Tool (#1324)
- Sync up DAYTIME and YEARMONTH fields with CSV plugin files (#1328)
Miscellaneous
- Update signoff usage [skip ci] (#1332)
v24.08.1
Packages
- Maven Release: https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.08.1/
- PyPI Package: https://pypi.org/project/spark-rapids-user-tools/24.08.1/
Changes
User Tools
- [DOC] spark_rapids CLI help cmd still shows cost savings (#1317)
- Fix Qualification and Profiling tools CLI argument shorthands (#1312)
- Raise error for enum creation from invalid string values (#1300)
- Append HADOOP_CONF_DIR to the tools CLASSPATH execution cmd (#1308)
- Fix key error and cross-join error during qualx evaluate (#1298)
- Qual tool: Print more useful log messages when failures happen downloading dependencies (#1292)
- Fix --help text for custom_model_file option (#1285)
Core
- Remove legacy SpeedupFactor from core output files (#1318)
- Mark decimalsum as supported in Qualification tool (#1323)
- Mark SMJ as unsupported operator for corner cases in left join (#1309)
- Remove arguments and code related to the html-report (#1311)
- Handle SparkRapidsBuildInfoEvent in GPU event logs (#1203)
- Enable recursive search for event logs by default and optional
--no-recursion
flag (#1297) - Qualification tool support filtering by a filesystem time range (#1299)
- Skip generating timeline for stages that do not have completion time (#1290)
- Save core tools logs to output log file (#1269)
- Qualification tool - Add option to filter by minimum event log size (#1291)
- Include exception message for unknown app status in core tool (#1281)
Miscellaneous
- Remove restricted google sheets link and outdated TCO section (#1289)
v24.08.0
Packages
- Maven Release: https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.08.0/
- PyPI Package: https://pypi.org/project/spark-rapids-user-tools/24.08.0/
Changes
User Tools
- Remove calculation of gpu cluster recommendation from python tool when cluster argument is passed (#1278)
- Remove unused argument
--target_platform
in Python Tool (#1279) - Qualification tool: Add output stats file for Execs(operators) (#1225)
- Include GPU information in the cluster recommendation for Dataproc and OnPrem (#1265)
- Remove speedup based recommendation column from qual_summary csv (#1268)
- Fix prediction CSV files for multiple qual directories (#1267)
- Clean up tools after removing CLI dependency (#1256)
- Rename cluster shape columns to use 'worker' prefix in the output files and rename metadata file (#1258)
- Remove CLI dependency in Dataproc
_pull_gpu_hw_info
implementation (#1245) - Replace split_nds with split_train_val (#1252)
- Update xgboost models and metrics (#1244)
- Add footnotes for config recommendations and speedup category in top candidate view (#1243)
- [BUG] Update Dataproc instance catalog for n1 series GPU info (#1242)
- Improvements in Cluster Config Recommender (#1241)
- Improve console output from python tool for failed/gpu/photon event logs (#1235)
- [FEA] Generate and use instance description file for Databricks-Azure platform (#1232)
- Remove arguments related to cost-savings (#1230)
- Updated models for latest databricks-aws datasets (#1231)
- Refactor QualX for Linter and Test Compatibility (#1228)
- Generate summary metadata file and fix node recommendation in python (#1216)
- [FEA] Remove gcloud CLI dependency for Dataproc platform (#1223)
- Updated models for latest dataproc eventlogs (#1226)
- Remove estimation-model column from qualification summary (#1220)
- Add option to add features.csv files to training set (#1212)
- Disable cost saving functionality (#1218)
- [FEA] Remove CLI dependency for EMR and Databricks-AWS platforms in user tool (#1196)
- Fix some basic pylint errors in qualx code (#1210)
- Qual tool tuning rec based on CPU event log coherently recommend tunings and node setup and infer cluster from eventlog (#1188)
- Add shap command to internal CLI for debugging (#1197)
- Add internal CLI to generate instance descriptions for CSPs (#1137)
- [FEA] Support custom XGBoost model file via user tools CLI (#1184)
- Updated models for new training data (#1186)
- Add evaluate_summary command to internal CLI (#1185)
- [DOC] Fix broken link to qualX docs and update python prerequisites (#1180)
- Bump to certifi-2024.7.4 and urllib3-1.26.19 (#1173)
- Disable UI-HTML report by default in Qualification tool (#1168)
- Fix parsing App IDs inside metrics directory in QualX (#1167)
- Refactor Databricks-AWS Qual tool to cache and process pricing info from DB website (#1141)
- Add plugin mechanism for dataset-specific preprocessing in qualx (#1148)
- Unsupported op logic should read action column from qual's output (#1150)
- Update qualx readme for training (#1140)
- Disable pylint-unreachable code in tox.ini (#1145)
Core
- Include GPU information in the cluster recommendation for Dataproc and OnPrem (#1265)
- [TASK] Optimize the storage of accumulables in core tools (#1263)
- Sync GetJsonObject support with Rapids-Plugin (#1266)
- Do not create new StageInfo object (#1261)
- [FEA] Add support for
map_from_arrays
in qualification tools (#1248) - Rename cluster shape columns to use 'worker' prefix in the output files and rename metadata file (#1258)
- Fix stage level metrics output csv file (#1251)
- Handle event logs with wildcards in status report generation (#1237)
- Fix duplicate records in DataSourceInfo report (#1227)
- Reduce memory footprint of stageInfo (#1222)
- Ensure UTF-8 encoding for reading non-english characters (#1211)
- Sync plugin support for hash-hive and shift operators (#1198)
- Sync-up the support of parse_url in qualification tool (#1195)
- Include status information for failed event logs in core tool (#1187)
- [FEA] Adding Benchmarking classes to evaluate core tools performance (#1169)
- [BUG] Fix handling of non-english characters in tools output files (#1189)
- [Bug] Fix java Qual tool handling of
--platform
argument (#1161) - Add all stage metrics to tools output (#1151)
- Follow-up 1142: remove TODO line (#1146)
- Mark wholestageCodeGen as shouldRemove when child nodes are removed (#1142)
- [FEA] Display full failure messages in failed CSV files (#1135)
Miscellaneous
- Qualification tool: Add option to filter event logs for a maximum file system size (#1275)
- Qualification tool should print Kryo related recommendations (#1204)
- Fix header check script to exclude files (#1224)
- Update header check script for pre-commit hooks (#1219)
- Follow-up 1189: handle non-english characters in data-output.js (#1208)
- Update pre-commit hooks to check for headers and white-spaces (#1205)
- user-tools:Update --help for cluster argument (#1178)
- Support fine-tuning models (#1174)
v24.06.1
Packages
- Maven Release: https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.06.1/
- PyPI Package: https://pypi.org/project/spark-rapids-user-tools/24.06.1/
Changes
User Tools
- Fix Python runtime error caused by numpy 2.0.0 release (#1130)
- Disable the spark_rapids bootstrap command (#1114)
Core
v24.06.0
Packages
- Maven Release: https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.06.0/
- PyPI Package: https://pypi.org/project/spark-rapids-user-tools/24.06.0/
Changes
User Tools
- Add support to Python 3.12 (#1111)
- user-tools: Update log messages (#1110)
- Enable xgboost prediction model by default (#1108)
- Add support to Python3.11 (#1105)
- Fix nan label issue in training (#1104)
- Fix qualx app metrics (#1102)
- clip appDuration to at least Duration (#1096)
- Fix missing assignment to savings_recommendations (#1098)
- Handle QualX behaviour when Qual Tool does not generate any outputs (#1095)
- Fix internal predict CLI and remove preprocessed argument (#1093)
- Update QualX to return default speedups and fix App Duration for incomplete apps (#1089)
- fix signature error from overlapping merges (#1084)
- sync w/ internal repo; update models (#1083)
- Reduce the maximum number of Java threads in CLI (#1082)
- Remove using Profiler metrics for QualX and Heuristics (#1080)
- Port QualX repo and add CLI for train (#1076)
- User tools fallback to default zone/region (#1054)
- Handle missing pricing info for user qual tool on Databricks platforms (#1053)
- Split job and stage level aggregated metrics into different files (#1050)
- Skip Cluster Inference when CSP CLIs are missing or not configured (#1035)
- Store Cluster Shape Recommendation in User Tools Qualification Output (#1005)
- Fix calculation of unsupported operators stage duration percentage (#1006)
- Update Databricks Azure qual tool to set env variable for ABFS paths (#1016)
- Add heuristics using stage spill metrics to skip apps (#1002)
- Fix failure in github workflow's pylint (#1015)
- Updating qual validation script to directly use top candidate view recommendation (#1001)
Core
- Fix typo in Profiler class using qual instead of prof (#1113)
- Fix missing appEndTime in raw_metrics folder (#1092)
- Sync tools with plugin newly supported operators (#1066)
- Fix java Qual tool Autotuner output when GPU device is missing (#1085)
- Update the Qual tool AutoTuner Heuristics against CPU event logs (#1069)
- Handling FileNotFound exception in AutoTuner (#1065)
- Handle metric names from legacy spark (#1052)
- Split job and stage level aggregated metrics into different files (#1050)
- Refactor ProfileResult classes to implement new interface design and add CSV output to Qual Tool (#1043)
- Hook up the auto tuner in the qualification tool (#1039)
- Profiler should identify the delta log ops and generate views for non-delta logs (#1031)
- Qualification tool - Handle cancelled jobs and stages better and don't skip the app (#1033)
- [FEA] Generate Status Report for Profiling Tool (#1012)
- Fix calculation of unsupported operators stage duration percentage (#1006)
- Fix potential problems and AQE updates in Qual tool (#1021)
- Sync supported operators with plugin changes and update default score (#1020)
- Refactor TaskEnd to be accessible by Q/P tools (#1000)
Miscellaneous
- Bump requests from 2.31.0 to 2.32.2 in /data_validation (#1077)
v24.04.0
Packages
- Maven Release: https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.04.0/
- PyPI Package: https://pypi.org/project/spark-rapids-user-tools/24.04.0/
Changes
User Tools
- [FEA] Add CLI to run prediction on estimation_model (#961)
- Adding SHAP predict values as new output file (#982)
- Update docs for building to clarify to build in a virtual environment (#976)
Core
- [BUG] Catch Profiler error when app info is empty (#994)
- Get stages from sqlId for collecting info for output writer functions (#996)
- Account for joboverhead time in qualification tool estimation (#992)
- [Followup] Fix handling of clusterTags and SparkVersion in Q/P Tools (#993)
- Fix handling of clusterTags and SparkVersion in Q/P Tools (#991)
- Refactor AppBase to use common AppMetaData between Q/P tools (#983)
- Refactor Stage info code between Q/P tools (#971)
v24.02.4
Packages
- Maven Release: https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/24.02.4/
- PyPI Package: https://pypi.org/project/spark-rapids-user-tools/24.02.4/
Changes
User Tools
- Fix Hadoop Azure version to be compatibe with Spark-3.5.0 (#975)
- Add speedup categories in qualification summary output (#958)
- Improve cluster node initialisation for CSPs (#964)
Core
- Remove databricks profiling recommendation for dynamicFilePruning (#972)
- Add AQEShuffleRead WriteFiles execs to the supportedOps and score files (#963)
- [FEA] Automate appending new operators to the platform score sheets (#954)
- Add support for InSubqueryExec Expression (#960)