Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Optimize log/rsqrt/cosh for rvv with ulp. * add rsqrt precison version for x86 * change reduce test order for icache * add some strategy for performance test * remove the ugly code just now * Add ulp ctest case for acos and optimize acos for riscv64. * to avoid timeouts * reduce repeat num for test * change test mode for reduce * add roofline for x86 reduce * remove typo in test * Add ulp test case for asin and optimize it with rvv. * adjust mamul benchmark shape * revise the value for pi as ulp * update for rsqrt as ulp * Optimize cos for rvv and update roofline. * add acos for ulp x86 * update opt for asin * add cos for x86 about ulp * update roofline for matmul * Optimize sin for rvv and update roofline. * change to std sqrt for x86 as rsqrt ulp * change the target for ulp test * Update ctest cases for unary. - Add acosh, but there is something wrong, debug it later. - Add sqrt, x86 use _mm256_sqrt_ps. - Modify ulp ref from ortki to c/c++ math library. - cos/sin ulp test for x86/riscv64 only. * change ortki sin to std::sin for ulp * add clamp for ntt support * add max min scalar vector version * Add ctest and rvv roofline for reduce. * change usless file * add opt for max and min * Add all ctest for reduce. * Add reduce sum/max/min and optimize mean. * add opt for pack m&n * Modify ctest and add benchmark test for clamp. * add special template for binary * Apply code-format changes * Add clamp into primitive_ops to optimize rvv. * add unary special version template * Apply code-format changes * add clamp roofline for x86 * add ntt profiler func * change style for Info * add more info for profiler * support markdown style * update roofline info for reduce@x86 * add unroll attribute for gcc&clang * change reality for compiler * support more kernels * unroll loop byhand * add unroll num for x86 and riscv * unrool loop * update unroll to support 2 inputs * change reality for loop unrool * add support for unary * Add gcc 14.2.0 and vlen config support for rvv. * open unrool function for x86 * update roofline for x86 * update unary roofline considering ldst * update roofline for x86 * revise bug for 2dvector * Apply code-format changes * revise change for special template * Apply code-format changes * Update rvv roofline for binary/unary/matmul. * update roofline for x86 as fma * adjust unroll num as special situation * revise tpyo * better performance for x86 * Modify floor_mod and fix matmul outer_product for rvv. * Apply code-format changes * adjust x86 unroll by case * more readable code * Update to riscv64 gcc 14.2.0 * Use latest rvv impl of exp in stackvm. * update roofline info for x86 pack k * revise for compiler opt problem * add volatile for matmul output * Fix performance regression of both binary and unary for rvv. * Optimize x86 inner_product * change for pack k benchmark * [ntt.x86] Remove unroll for outer_product & mma * Apply code-format changes * fallback to check if there is wrong * [ntt.x86] Reorder mma from m,k to k,m * recover reduce * try ubuntu 22.04 * opt for pack K matmul * opt for pack MK and K * Apply code-format changes * remove useless code * Revert "try ubuntu 22.04" This reverts commit ee71469. * add tmate session. * some opt for matmul * Revert "add tmate session." This reverts commit 801b8e0. * Try to disable loop unroll to fix reduce abort. * Apply code-format changes * Remove redundant fp16 code. --------- Co-authored-by: guodongliang <[email protected]> Co-authored-by: uranus0515 <[email protected]> Co-authored-by: zhangyang2057 <[email protected]> Co-authored-by: sunnycase <[email protected]> Co-authored-by: sunnycase <[email protected]>
- Loading branch information