Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add INT4 quant/de-quant kernels #620

Open
wants to merge 8 commits into
base: main_perf
Choose a base branch
from
Open

Add INT4 quant/de-quant kernels #620

wants to merge 8 commits into from

Commits on Jul 16, 2024

  1. Add Perf Kernels

    Add Perf Kernels
    
    This is a combination of 2 commits.
    
    Add Perf Kernels
    
    Add Perf Kernels
    
    This is a combination of 6 commits.
    
    add perf-kernels
    
    fix formating issues
    
    fix unused variables and other bugs
    
    fix other issues
    
    remove scripts
    
    save
    
    check changes
    
    format
    
    save
    
    save
    
    try
    
    pre-commit check
    
    save
    micmelesse committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    2d2dbe1 View commit details
    Browse the repository at this point in the history
  2. skip backward (#586)

    micmelesse committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    17575ea View commit details
    Browse the repository at this point in the history
  3. Change all block pointers to tensor pointers (#585)

    Change all block pointers to tensor pointers
    
    Block pointers are for nvidia TMAs. They are useful for regular loads as well but not well supported.
    
    Also cleaned up some code I came across along the way and updated comment at the top.
    vgokhale authored and micmelesse committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    a3d784a View commit details
    Browse the repository at this point in the history
  4. Add support for bshd layout (#587)

    Add support for layouts commonly used by users.
    
    Add option for varlen / thd layout to specify equal context lengths for all batches. Also often used by users.
    vgokhale authored and micmelesse committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    aa6685a View commit details
    Browse the repository at this point in the history
  5. Post-Merge CI (#612)

    * remove on push for Integration Tests
    
    * rename
    
    * add post merge test
    
    * save
    
    * dtype params
    
    * skip bad config
    
    * fix more stuff
    micmelesse committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    dbe1173 View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2024

  1. Increase CI timeout (#615)

    Increase CI timeout
    vgokhale authored Jul 18, 2024
    Configuration menu
    Copy the full SHA
    23ba546 View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2024

  1. Couple of FA optimizations (#608)

    Couple of FA optimizations
    
    Set SM scale multiplication to a constexpr. Minor asm improvement.
    
    Changed acc scaling to adjust for softmax division to
    multiplication with reciprocal. ~10% perf improvement.
    
    ---------
    
    Co-authored-by: Michael Melesse <[email protected]>
    vgokhale and micmelesse authored Jul 19, 2024
    Configuration menu
    Copy the full SHA
    df4c4d3 View commit details
    Browse the repository at this point in the history

Commits on Jul 29, 2024

  1. Add INT4 quant/de-quant kernels

    Rahul Batra committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    71c5845 View commit details
    Browse the repository at this point in the history