Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/profiling #137

Merged
merged 5 commits into from
Sep 20, 2024
Merged

Feat/profiling #137

merged 5 commits into from
Sep 20, 2024

Conversation

nathanielsimard
Copy link
Member

Two level of profiling for now:

Basic

cmd: CUBECL_DEBUG_OPTION=profile CUBECL_DEBUG_LOG=stdout cargo bench --bench matmul --features cuda-jit
For 5 kernels:

| 7.438437ms | CmmaKernel
| 6.712298ms | CmmaKernel
| 6.683767ms | CmmaKernel
| 7.825637ms | CmmaKernel
| 7.125788ms | CmmaKernel

Full

cmd: CUBECL_DEBUG_OPTION=profile-full CUBECL_DEBUG_LOG=stdout cargo bench --bench matmul --features cuda-jit

For 1 kernel:

| 7.079846ms | cubecl_linalg::matmul::cmma::base::cmma_kernel::CmmaKernel<half::binary16::f16, half::binary16::f16, cubecl_cuda::runtime::CudaRuntime>: (
    KernelSettings {
        mappings: [],
        vectorization_global: None,
        vectorization_partial: [
            Input {
                pos: 0,
                vectorization: Some (
                    8,
                ),
            },
            Input {
                pos: 1,
                vectorization: Some (
                    8,
                ),
            },
            Output {
                pos: 0,
                vectorization: Some (
                    8,
                ),
            },
        ],
        cube_dim: CubeDim {
            x: 32,
            y: 8,
            z: 1,
        },
        reading_strategy: [],
    },
    ComptimeCmmaInfo {
        block_size_m: 128,
        block_size_k: 16,
        block_size_n: 128,
        tile_size: 16,
        check_m_bounds: false,
        check_k_bounds: false,
        check_n_bounds: false,
        unroll: false,
        coop_dim: 32,
        num_coops: 8,
        num_accumulators: 8,
        write_out_strategy: 1,
        cube_dispatch_strategy: 1,
        compute_loop_order_strategy: 0,
        reuse_lhs_fragment: false,
    },
) CubeCount (16, 16, 8)

@nathanielsimard nathanielsimard merged commit 16a79fc into main Sep 20, 2024
5 checks passed
@nathanielsimard nathanielsimard deleted the feat/profiling branch September 20, 2024 17:00
wingertge added a commit to wingertge/cubecl that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant