Bump bitsandbytes from 0.38.1 to 0.41.3.post2 #53

dependabot · 2023-12-25T16:12:22Z

Bumps bitsandbytes from 0.38.1 to 0.41.3.post2.

Release notes

Bug and CUDA fixes + performance

Release 0.41.0 features an overhaul of the CUDA_SETUP routine. We trust PyTorch to find the proper CUDA binaries and use those. If you use a CUDA version that differs from PyTorch, you can now control the binary that is loaded for bitsandbytes by setting the BNB_CUDA_VERSION variable. See the custom CUDA guide for more information.

Besides that, this release features a wide range of bug fixes, CUDA 11.8 support for Ada and Hopper GPUs, and an update for 4-bit inference performance.

Previous 4-bit inference kernels were optimized for RTX 4090 and Ampere A40 GPUs, but the performance was poor for A100 GPUs, which are common. In this release, A100 performance is slightly improved (40%) and is not faster than 16-bit inference, while RTX 4090 and A40 is slightly lower (10% lower).

This leads to approximate speedups compared to 16-bit (BF16) of roughly:

RTX 4090: 3.8x

RTX 3090 / A40: 3.1x

A100: 1.5x

RTX 6000: 1.3x

RTX 2080 Ti: 1.1x

0.41.0

Features:

Added precompiled CUDA 11.8 binaries to support H100 GPUs without compilation #571

CUDA SETUP now no longer looks for libcuda and libcudart and relies PyTorch CUDA libraries. To manually override this behavior see: how_to_use_nonpytorch_cuda.md. Thank you @rapsealk

Bug fixes:

Fixed a bug where the default type of absmax was undefined which leads to errors if the default type is different than torch.float32. # 553

Fixed a missing scipy dependency in requirements.txt. #544

Fixed a bug, where a view operation could cause an error in 8-bit layers.

Fixed a bug where CPU bitsandbytes would during the import. #593 Thank you @bilelomrani

Fixed a but where a non-existent LD_LIBRARY_PATH variable led to a failure in python -m bitsandbytes #588

Removed outdated get_cuda_lib_handle calls that lead to errors. #595 Thank you @ihsanturk

Fixed bug where read-permission was assumed for a file. #497

Fixed a bug where prefetchAsync lead to errors on GPUs that do not support unified memory but not prefetching (Maxwell, SM52). #470 #451 #453 #477 Thank you @jllllll and @stoperro

Documentation:

Improved documentation for GPUs that do not support 8-bit matmul. #529

Added description and pointers for the NF4 data type. #543

User experience:

Improved handling of default compute_dtype for Linear4bit Layers, so that compute_dtype = input_dtype if the input data type is stable enough (float32, bfloat16, but not float16).

Performance:

improved 4-bit inference performance for A100 GPUs. This degraded performance for A40/RTX3090 and RTX 4090 GPUs slightly.

Deprecated:

8-bit quantization and optimizers that do not use blockwise quantization will be removed on 0.42.0. All blockwise methods will remain fully supported.

4-bit Inference

Efficient 4-bit Inference (NF4, FP4)

This release adds efficient inference routines for batch size 1. Expected speedups vs 16-bit precision (fp16/bf16) for matrix multiplications with inner product dimension of at least 4096 (LLaMA 7B) is:

2.2x for Turing (T4, RTX 2080, etc.)

3.4x for Ampere (A100, A40, RTX 3090, etc.)

4.0x for Ada/Hopper (H100, L40, RTX 4090, etc.)

... (truncated)

Changelog

Sourced from bitsandbytes's changelog.

0.0.21

Ampere, RTX 30 series GPUs now compatible with the library.

0.0.22:

Fixed an error where a reset_parameters() call on the StableEmbedding would lead to an error in older PyTorch versions (from 1.7.0).

0.0.23:

Bugs:

Unified quantization API: each quantization function now returns Q, S where Q is the quantized tensor and S the quantization state which may hold absolute max values, a quantization map or more. For dequantization all functions now accept the inputs Q, S so that Q is dequantized with the quantization state S.

Fixed an issue where the CUDA 11.1 binary was not compiled with the right headers

API changes:

Block-wise quantization for optimizers now enabled by default

Features:

Block-wise quantization routines now support CPU Tensors.

0.0.24:

Fixed a bug where a float/half conversion led to a compilation error for CUDA 11.1 on Turning GPUs.

removed Apex dependency for bnb LAMB

0.0.25:

Features:

Added skip_zeros for block-wise and 32-bit optimizers. This ensures correct updates for sparse gradients and sparse models.

Added support for Kepler GPUs. (#4)

Added Analysis Adam to track 8-bit vs 32-bit quantization errors over time.

Make compilation more user friendly.

Bug fixes:

fixed "undefined symbol: __fatbinwrap_38" error for P100 GPUs on CUDA 10.1 (#5)

Docs:

Added docs with instructions to compile from source.

0.26.0:

Features:

Added Adagrad (without grad clipping) as 32-bit and 8-bit block-wise optimizer.

Added AdamW (copy of Adam with weight decay init 1e-2). #10

Introduced ModuleConfig overrides which can be seamlessly be used at initialization time of a module.

Added bnb.nn.Embedding layer which runs at 32-bit but without the layernorm. This works well if you need to fine-tune pretrained models that do not have a embedding layer norm. #19

Bug fixes:

Fixed a bug where weight decay was incorrectly applied to 32-bit Adam. #13

... (truncated)

Commits

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) from 0.38.1 to 0.41.3.post2. - [Release notes](https://github.com/TimDettmers/bitsandbytes/releases) - [Changelog](https://github.com/TimDettmers/bitsandbytes/blob/main/CHANGELOG.md) - [Commits](https://github.com/TimDettmers/bitsandbytes/commits) --- updated-dependencies: - dependency-name: bitsandbytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]>

….3.post2

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Dec 25, 2023

dependabot bot had a problem deploying to production December 25, 2023 16:12 Failure

kyegomez merged commit 918b212 into master Dec 25, 2023
7 of 32 checks passed

dependabot bot deleted the dependabot/pip/bitsandbytes-0.41.3.post2 branch December 25, 2023 21:50

kyegomez added a commit that referenced this pull request Sep 3, 2024

Merge pull request #53 from kyegomez/dependabot/pip/bitsandbytes-0.41…

3dc2455

….3.post2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump bitsandbytes from 0.38.1 to 0.41.3.post2 #53

Bump bitsandbytes from 0.38.1 to 0.41.3.post2 #53

dependabot bot commented on behalf of github Dec 25, 2023

Bump bitsandbytes from 0.38.1 to 0.41.3.post2 #53

Bump bitsandbytes from 0.38.1 to 0.41.3.post2 #53

Conversation

dependabot bot commented on behalf of github Dec 25, 2023

Bug and CUDA fixes + performance

0.41.0

4-bit Inference

Efficient 4-bit Inference (NF4, FP4)

0.0.21

0.0.22:

0.0.23:

0.0.24:

0.0.25:

0.26.0: