Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull in upstream changes from argonne-lcf @ main #12

Merged
merged 9 commits into from
Dec 25, 2024

Conversation

saforem2
Copy link
Owner

@saforem2 saforem2 commented Dec 25, 2024

Summary by Sourcery

Update the optimizer code and configuration scripts.

Enhancements:

  • Reorganize and streamline the optimizer selection logic in the get_megatron_optimizer function.

Tests:

  • Add assertions to ensure the correct optimizer is selected.

Copy link

sourcery-ai bot commented Dec 25, 2024

Reviewer's Guide by Sourcery

This pull request integrates upstream changes from the argonne-lcf repository into the main branch. The primary focus is on enhancing optimizer selection and refining the weight conversion process for Hugging Face models.

Class diagram for updated optimizer hierarchy

classDiagram
    class Optimizer {
        <<interface>>
        +step()
    }
    class Adam
    class AdamW
    class AdamW8bit
    class GaLoreAdamW
    class GaLoreAdamW8bit
    class Adafactor
    class GaLoreAdafactor
    class SGD
    class Lamb
    class Shampoo

    Optimizer <|-- Adam
    Optimizer <|-- AdamW
    Optimizer <|-- AdamW8bit
    Optimizer <|-- GaLoreAdamW
    Optimizer <|-- GaLoreAdamW8bit
    Optimizer <|-- Adafactor
    Optimizer <|-- GaLoreAdafactor
    Optimizer <|-- SGD
    Optimizer <|-- Lamb
    Optimizer <|-- Shampoo

    note for Adam "Base Adam optimizer"
    note for AdamW "Adam with weight decay"
    note for AdamW8bit "8-bit AdamW for memory efficiency"
    note for GaLoreAdamW "Low-rank AdamW variant"
    note for Lamb "Layer-wise Adaptive Moments"
Loading

File-Level Changes

Change Details Files
Introduced more flexible optimizer choices.
  • Added support for Adam8bit optimizer.
  • Added support for AdamW schedule free optimizer.
  • Added support for GaloreAdamW optimizer.
  • Added support for GaloreAdamW8bit optimizer.
  • Added support for GaloreAdamW8bitPerLayer optimizer.
  • Added support for IPEX fused Lamb optimizer.
  • Added support for Shampoo optimizer.
  • Added support for SGD schedule free optimizer.
  • Added support for SophiaG optimizer.
  • Reorganized optimizer selection logic to accommodate new options.
  • Updated command-line arguments to reflect new optimizer choices.
  • Added conditional import for schedule free optimizers to avoid unnecessary dependencies.
  • Added conditional import for Galore optimizers to avoid unnecessary dependencies.
  • Added conditional import for IPEX optimizers to avoid unnecessary dependencies.
  • Added conditional import for Shampoo optimizer to avoid unnecessary dependencies.
  • Added conditional import for SophiaG optimizer to avoid unnecessary dependencies.
  • Added conditional import for Apex optimizers to avoid unnecessary dependencies.
  • Added conditional import for deepspeed fused optimizers to avoid unnecessary dependencies.
  • Added conditional import for bitsandbytes optimizers to avoid unnecessary dependencies.
  • Added conditional import for transformers optimizers to avoid unnecessary dependencies.
  • Added assertion to ensure optimizer is initialized.
  • Added support for CPU offloading for Adam optimizer.
  • Added support for deepspeed fused Adam optimizer.
  • Added support for onebit Adam optimizer.
megatron/optimizer/__init__.py
megatron/arguments.py
Improved weight conversion script for Hugging Face models.
  • Added logging for Hugging Face weight loading process.
  • Added rank setup and logging initialization.
  • Added assertion to check vocabulary size during embedding refactor.
  • Added breakpoint for debugging vocabulary size mismatch.
  • Added handling for padded vocabulary size.
  • Added more detailed logging for weight mapping information.
  • Added assertion to ensure decoder pattern match is successful.
  • Added support for Sophia machine.
  • Updated data path for Polaris and Sophia machines.
  • Added LR, LR decay style, and LR warmup fraction as environment variables.
  • Added timing string as an optional argument.
  • Added gradient accumulation steps for Sophia machine.
  • Added support for no flash attention argument.
  • Added data flags as an optional argument.
  • Added tokenizer flags as an optional argument.
  • Added DS config path.
  • Added printing of DS config details.
  • Added weight sum calculation.
  • Added data file list stem.
  • Added data cache path.
  • Added data flags printing.
tools/hf2megads_weight_converter.py
ALCF/helpers.sh

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @saforem2 - I've reviewed your changes - here's some feedback:

Overall Comments:

  • There is some commented out code in the optimizer section that should be removed rather than left as comments (e.g. the duplicate GaLoreAdamW section)
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@saforem2 saforem2 merged commit 18debdb into saforem2:main Dec 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant