Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Records and ModelConfigMeasurement classes #78

Merged
merged 3 commits into from
Sep 16, 2024

Conversation

nv-braf
Copy link
Contributor

@nv-braf nv-braf commented Sep 6, 2024

Ports the Record and ModelConfigMeasurement classes into GAP.

The Record class is used to store individual metrics (like latency or throughput).
The MCM class is used to store all the perf metrics for a model configuration and has methods to compare configurations, change objectives and checkpoint.

@nv-braf nv-braf marked this pull request as ready for review September 11, 2024 16:47
@nv-braf nv-braf merged commit 73fd2af into optimize_subcommand_phase1 Sep 16, 2024
6 of 7 checks passed
@nv-braf nv-braf deleted the port_records_and_mcm branch September 16, 2024 20:58
nv-braf added a commit that referenced this pull request Sep 23, 2024
* Adding Records and MCM. Very basic unit tests passing.

* Fixes + all unit testing completed

* Adding missing record testing + missing record file
nv-braf added a commit that referenced this pull request Oct 1, 2024
* Adding Records and MCM. Very basic unit tests passing.

* Fixes + all unit testing completed

* Adding missing record testing + missing record file
pvijayakrish pushed a commit that referenced this pull request Oct 8, 2024
nv-braf added a commit that referenced this pull request Oct 31, 2024
* Adding Records and MCM. Very basic unit tests passing.

* Fixes + all unit testing completed

* Adding missing record testing + missing record file
nv-braf added a commit that referenced this pull request Nov 7, 2024
* Adding Records and MCM. Very basic unit tests passing.

* Fixes + all unit testing completed

* Adding missing record testing + missing record file
nv-braf added a commit that referenced this pull request Nov 18, 2024
* Adding Records and MCM. Very basic unit tests passing.

* Fixes + all unit testing completed

* Adding missing record testing + missing record file
nv-braf added a commit that referenced this pull request Nov 25, 2024
* Adding Records and MCM. Very basic unit tests passing.

* Fixes + all unit testing completed

* Adding missing record testing + missing record file
nv-braf added a commit that referenced this pull request Dec 9, 2024
* Adding Records and MCM. Very basic unit tests passing.

* Fixes + all unit testing completed

* Adding missing record testing + missing record file
nv-braf added a commit that referenced this pull request Dec 10, 2024
* Adding Records and MCM. Very basic unit tests passing.

* Fixes + all unit testing completed

* Adding missing record testing + missing record file
nv-braf added a commit that referenced this pull request Dec 11, 2024
* Add SearchParameters class (#76)

* Initial code done. Some unit testing in place

* All unit tests passing + pre-commit changes

* Fixing codeQL issue

* Fixing pytest issue

* Adding TypeAlias

* Removing python 3.8

* Changes based on pre-review w/ Elias

* Fixing codeQL issue

* Removing type ignore

* Fixing comment

* Port Records and ModelConfigMeasurement classes (#78)

* Adding Records and MCM. Very basic unit tests passing.

* Fixes + all unit testing completed

* Adding missing record testing + missing record file

* Port Run Config Measurement (#91)

* Initial changes, basic unit tests passing

* Adding support for making the objective a telemetry metric

* Calculation logic + unit testing added

* Constraint logic in place. All unit tests passing

* Fix codeQL issues.

* Removing accidental negation

* Create Optuna Objective Generator Class (#96)

* Created type top-level file

* Added logic and testing for search space methods

* Added logic and unit testing for generating objectives

* Fixing codeQL issues

* Adding early termination logic

* Fixing logger and adding debug methods

* Adding end-to-end generator testing

* Create Sweep Objective Generator (#104)

* Creating sweep based objective generator

* Refactoring and cleaning up type aliases

* Fixing codeQL issues

* Fixing generator count test

* Changing get_list to assert versus return an empty list

* Create PA Config Class (#110)

* Initialization of class complete

* Refactoring set options method

* Added CLI string method

* Adding representation method

* Fixing codeQL issues

* Changing asserts to use ValueError

* Removing comment

* Fixes based on CR

* Removing try-except

* Add Analyze to Search Parameters (#117)

* Differentiating btw PA and GAP runtime parameters

* Adding GAP options to config command

* Adding GAP option to optimize

* Adding logic for anaylze to search parameters

* Fixing codeQL issues

* Creating enum for subcommand

* Create GenAI-Perf Config Class (#119)

* Adding GAP option to optimize

* Fixing codeQL issues

* Adding config for genai-perf

* Fixing codeQL issues

* Create RunConfig class (#123)

* Adding GAP option to optimize

* Fixing codeQL issues

* Adding RunConfig class along w/ missing checkpoint support to config classes

* Create Results class (#132)

* Adding GAP option to optimize

* Fixing codeQL issues

* Results class initial coding w/ testing

* Minor refactor

* Fixing issue in RCM testing

* Fixing codeQL issues

* Create checkpoint class (#134)

* initial changes

* Adding GAP option to optimize

* Fixing codeQL issues

* Results class initial coding w/ testing

* Fixing issue in RCM testing

* Fixing codeQL issues

* Checkpoint class creation

* Fixing codeQL issues

* fixing codeql issue

* Removing checkpoint file

* Removing checkpoint file

* Fixing json to properly format checkpoint file

* Minor typing cleanup

* Adding records for ISL/OSL and testing this in checkpoint creation

* Changing method name

* Changing read/write checkpoint method names

* Turn statistics into GAP Records (#166)

* Changing record names to match GAP and adding some missing type checking

* Fixing other unit tests

* Updating time to first token records

* Updating inter token latency records

* Updaing output token throughput record

* Adding output token throughput per request records

* Adding output sequence length (OSL) records

* Adding Input sequence length (ISL) records

* Removing non-GAP records

* Adding telemetry records

* Fixing unit testing

* Adding request goodput record

* Adding method to create records from statistics

* Added very basic unit testing

* Remove demo file (accidental commit)

* Fix codeql error

* Fixing merge issue

* Fixes/Changes needed during testing Analyze subcommand (#177)

* Fixes found during borecleaning

* Fixing codeql issues

* Add support for the current CLI to PA Config Generator (#182)

* Added support for the CLI to PA config generator

* Fixing codeQL issues

* Removing redundant extra_args check

* Add Analyze Subcommand (#186)

* Adding CLI options for analyze along with the subcommand. Updates to underlying classes to support using the CLI.

* Fixing codeQL issues

* Actually raise the exception

* Update help comment

* Refactoring subcommands

* Fixing codeql issues and other small changes from PR

* Refactoring run method to be common btw profile and analyze subcommands

* Fixing codeql issues

* Fixing codeql

* Adds support for skipping profiling if the Result is found in the checkpoint (#191)

* Add support for skipping profiling if the results are found in the checkpoint

* Fix codeql issue

* Removing mutable default in RCM

* Changes based on PR

* fixing codeql issue

* Capture Telemetry Records in checkpoint (#197)

* Updated GPU Records to match what TelemetryRecords is doing. Added method to convert TelemetryDict into Records

* Fix codeQL issues

* Fixing unit test failures

* Fixing file names to match tags

* Fixing merge conflict

* Analyze: Print CSV reports (#207)

* Initial changes for printing report in analyze

* Refactoring

* Fixing codeql issues

* Removing csv/checkpoint

* Fixing merge conflicts. Updating num_prompts -> num_dataset_entries

* Fixing codeql and mutable default issues.

* Fixing remaining codeql issue

* Adding new GAP options to ignore list in PA config generator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants