-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for skipping profiling if the Result is found in the checkpoint #191
Adds support for skipping profiling if the Result is found in the checkpoint #191
Conversation
2a9e864
to
6ea90c9
Compare
Why does this PR contain changes made from previous PRs? |
82a604f
to
45b9811
Compare
Sorry about that! Every Monday I pull and rebase the dev branch (analyze_subcommand_phase1) from main to ensure that there will be minimal conflicts when we eventually merge. Unfortunately, this means I need to also rebase any branches I have up for review (which I sometimes forget to do!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…ckpoint (#191) * Add support for skipping profiling if the results are found in the checkpoint * Fix codeql issue * Removing mutable default in RCM * Changes based on PR * fixing codeql issue
…ckpoint (#191) * Add support for skipping profiling if the results are found in the checkpoint * Fix codeql issue * Removing mutable default in RCM * Changes based on PR * fixing codeql issue
* Add SearchParameters class (#76) * Initial code done. Some unit testing in place * All unit tests passing + pre-commit changes * Fixing codeQL issue * Fixing pytest issue * Adding TypeAlias * Removing python 3.8 * Changes based on pre-review w/ Elias * Fixing codeQL issue * Removing type ignore * Fixing comment * Port Records and ModelConfigMeasurement classes (#78) * Adding Records and MCM. Very basic unit tests passing. * Fixes + all unit testing completed * Adding missing record testing + missing record file * Port Run Config Measurement (#91) * Initial changes, basic unit tests passing * Adding support for making the objective a telemetry metric * Calculation logic + unit testing added * Constraint logic in place. All unit tests passing * Fix codeQL issues. * Removing accidental negation * Create Optuna Objective Generator Class (#96) * Created type top-level file * Added logic and testing for search space methods * Added logic and unit testing for generating objectives * Fixing codeQL issues * Adding early termination logic * Fixing logger and adding debug methods * Adding end-to-end generator testing * Create Sweep Objective Generator (#104) * Creating sweep based objective generator * Refactoring and cleaning up type aliases * Fixing codeQL issues * Fixing generator count test * Changing get_list to assert versus return an empty list * Create PA Config Class (#110) * Initialization of class complete * Refactoring set options method * Added CLI string method * Adding representation method * Fixing codeQL issues * Changing asserts to use ValueError * Removing comment * Fixes based on CR * Removing try-except * Add Analyze to Search Parameters (#117) * Differentiating btw PA and GAP runtime parameters * Adding GAP options to config command * Adding GAP option to optimize * Adding logic for anaylze to search parameters * Fixing codeQL issues * Creating enum for subcommand * Create GenAI-Perf Config Class (#119) * Adding GAP option to optimize * Fixing codeQL issues * Adding config for genai-perf * Fixing codeQL issues * Create RunConfig class (#123) * Adding GAP option to optimize * Fixing codeQL issues * Adding RunConfig class along w/ missing checkpoint support to config classes * Create Results class (#132) * Adding GAP option to optimize * Fixing codeQL issues * Results class initial coding w/ testing * Minor refactor * Fixing issue in RCM testing * Fixing codeQL issues * Create checkpoint class (#134) * initial changes * Adding GAP option to optimize * Fixing codeQL issues * Results class initial coding w/ testing * Fixing issue in RCM testing * Fixing codeQL issues * Checkpoint class creation * Fixing codeQL issues * fixing codeql issue * Removing checkpoint file * Removing checkpoint file * Fixing json to properly format checkpoint file * Minor typing cleanup * Adding records for ISL/OSL and testing this in checkpoint creation * Changing method name * Changing read/write checkpoint method names * Turn statistics into GAP Records (#166) * Changing record names to match GAP and adding some missing type checking * Fixing other unit tests * Updating time to first token records * Updating inter token latency records * Updaing output token throughput record * Adding output token throughput per request records * Adding output sequence length (OSL) records * Adding Input sequence length (ISL) records * Removing non-GAP records * Adding telemetry records * Fixing unit testing * Adding request goodput record * Adding method to create records from statistics * Added very basic unit testing * Remove demo file (accidental commit) * Fix codeql error * Fixing merge issue * Fixes/Changes needed during testing Analyze subcommand (#177) * Fixes found during borecleaning * Fixing codeql issues * Add support for the current CLI to PA Config Generator (#182) * Added support for the CLI to PA config generator * Fixing codeQL issues * Removing redundant extra_args check * Add Analyze Subcommand (#186) * Adding CLI options for analyze along with the subcommand. Updates to underlying classes to support using the CLI. * Fixing codeQL issues * Actually raise the exception * Update help comment * Refactoring subcommands * Fixing codeql issues and other small changes from PR * Refactoring run method to be common btw profile and analyze subcommands * Fixing codeql issues * Fixing codeql * Adds support for skipping profiling if the Result is found in the checkpoint (#191) * Add support for skipping profiling if the results are found in the checkpoint * Fix codeql issue * Removing mutable default in RCM * Changes based on PR * fixing codeql issue * Capture Telemetry Records in checkpoint (#197) * Updated GPU Records to match what TelemetryRecords is doing. Added method to convert TelemetryDict into Records * Fix codeQL issues * Fixing unit test failures * Fixing file names to match tags * Fixing merge conflict * Analyze: Print CSV reports (#207) * Initial changes for printing report in analyze * Refactoring * Fixing codeql issues * Removing csv/checkpoint * Fixing merge conflicts. Updating num_prompts -> num_dataset_entries * Fixing codeql and mutable default issues. * Fixing remaining codeql issue * Adding new GAP options to ignore list in PA config generator
This adds the code to allow us to skip profiling a configuration that we have checkpointed (in a previous run).
Add in the missing method to create a representation of the GAP config and then it is the combination of GAP + PA representations that are used to determine if a config was already profiled.
If so, we skip, else we create a new/unique configuration name (by incrementing the number suffix of the highest currently stored configuration in the Results).