Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half precision support #1257

Open
wants to merge 62 commits into
base: develop
Choose a base branch
from
Open

Half precision support #1257

wants to merge 62 commits into from

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Jan 12, 2023

It adds the half (fp16) into ginkgo.
Some related collection:

  • GCC 12+ supports _Float16
  • Clang 15+ supports _Float16
  • Cuda 10+ supports 16 bit atomicAdd on Compute Capability >= 7.0 (atomicCAS has same condition although it only mentioned the version in the atomicAdd)
  • Hip does not support 16 bit atomic
  • Sycl (I am not sure)
  • c++-23 supports float16 and bfloat16

Cuda 9.2 __half does not contain +=. (I think it can be added by more operator overload outside)
ROCm 4.0 can not convert __half to double. (I do not think it can be added outside?)
I will disable half support for these two version.

There are two additional commit: one is fixing oneAPI6 (#1251) and the other is multigrid experiments.
They will be cleaned afterwards.

The following PR is related to half:

Discussion:
Do we still need to use float2half from vendor in gko::half?

Some fixes are extracted from this pr to #1253
TODO:

  • compile everything and run-able
  • change the IDR without half when running curand/hiprand, throws the error
  • only test the double <-> floator to wait the thrust<__half>
    • current testing on double, float. but a few double test with half are failed (I have create the testing/benchmark next_precision chain and also add some TODO at the failed tests)
  • thrust<__half> operator.
    • the stacking issue needs to be resolved
  • MPI Custom type/op for __half (it can be communicated but the reduction will be wrong)
  • Csr strategy logic when arch does not support 16bit atomic
  • Disable half when it does not need? (with GINKGO_ENABLE_HALF)
  • Using Sycl half all not gko::half when using dpcpp
  • clean code (might not need some changes under current half design)
  • Move gko::half to public interface

Closes #73

@yhmtsai yhmtsai added the 1:ST:WIP This PR is a work in progress. Not ready for review. label Jan 12, 2023
@ginkgo-bot ginkgo-bot added mod:all This touches all Ginkgo modules. reg:benchmarking This is related to benchmarking. reg:build This is related to the build system. reg:example This is related to the examples. reg:testing This is related to testing. type:factorization This is related to the Factorizations type:matrix-format This is related to the Matrix formats type:multigrid This is related to multigrid type:preconditioner This is related to the preconditioners type:solver This is related to the solvers type:stopping-criteria This is related to the stopping criteria labels Jan 12, 2023
@MarcelKoch
Copy link
Member

I would suggest putting all the todos into separate PRs. Having it compile and run in the first PR is IMO already enough. You could create a github project to track the rest of the todo.

@sonarcloud
Copy link

sonarcloud bot commented Mar 25, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug D 2 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 137 Code Smells

17.3% 17.3% Coverage
1.6% 1.6% Duplication

@yhmtsai yhmtsai force-pushed the half branch 2 times, most recently from ade5e58 to 8a6d5fb Compare June 18, 2023 22:35
@yhmtsai yhmtsai added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Jun 19, 2023
yhmtsai and others added 28 commits October 23, 2024 13:41
Note: the issue is that numerical_limits<half>::infinite returns float instead of half. Maybe changing that would be a better solution
- use Csr in residual norm for half apply support
- use higher tolerance for mc64 due to half range
- some example can not finish in half precision for mc64
- skip some test in half due to half range
- fix the half limit value

Co-authored-by: Marcel Koch <[email protected]>
Co-authored-by: Marcel Koch <[email protected]>
Co-authored-by: Thomas Grützmacher <[email protected]>
Co-authored-by: Thomas Grützmacher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review 1:ST:run-full-test mod:all This touches all Ginkgo modules. reg:benchmarking This is related to benchmarking. reg:build This is related to the build system. reg:example This is related to the examples. reg:testing This is related to testing. type:factorization This is related to the Factorizations type:matrix-format This is related to the Matrix formats type:multigrid This is related to multigrid type:preconditioner This is related to the preconditioners type:solver This is related to the solvers type:stopping-criteria This is related to the stopping criteria
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verifying Ginkgo with custom value types Full support for custom datatypes
5 participants