Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assuming other work-items progress without having a WG barrier in prm/core/memory/memfence/basic/global_u16/st_memfence_screl_wave__ld_memfence_scacq_wave/1_4x1x1_1x1x1 #15

Open
pjaaskel opened this issue Feb 8, 2016 · 0 comments

Comments

@pjaaskel
Copy link

pjaaskel commented Feb 8, 2016

prm/core/memory/memfence/basic/global_u16/st_memfence_screl_wave__ld_memfence_scacq_wave/1_4x1x1_1x1x1 assumes there's forward progress between work-items even if there is no barrier in the kernel. AFAIU this is undefined behavior and a deadlock is possible.

module &sample:1:0:$base:$large:$near;
prog global_u16 &global_var = 0;
prog global_u64 &global_flag;

prog kernel &test_kernel(
    kernarg_u64 %output,
    kernarg_u64 %input)
{
    ld_kernarg_align(8)_u64 $d0, [%input];
    workitemflatabsid_u64   $d1;
    mad_u64 $d2, $d1, 2, $d0;
    ld_global_align(2)_u16  $s1, [$d2];
    workitemflatabsid_u64   $d3;
    div_u64 $d3, $d3, WAVESIZE;
    cmp_ne_b1_u64   $c0, $d3, 3;

    ; All but "wave id" 3 skip the store
    cbr_b1  $c0, @skip_store;

  ; Wave id 3 should write the 1 to global_var here to ...
    st_global_align(2)_u16  $s1, [&global_var];
    mov_b64 $d4, 1;
    memfence_screl_wave;
    atomicnoret_st_global_rlx_system_b64    [&global_flag], $d4;
    br  @skip_memfence;

@skip_store:
    atomic_ld_global_rlx_system_b64 $d5, [&global_flag];
    cmp_ne_b1_u64   $c0, $d5, 1;

  ; ... release the other work-items spinning in this loop.
    cbr_b1  $c0, @skip_store;
    memfence_scacq_wave;

@skip_memfence:
    ld_global_align(2)_u16  $s0, [&global_var];
    ld_kernarg_align(8)_u64 $d6, [%output];
    mad_u64 $d7, $d1, 2, $d6;
    st_global_align(2)_u16  $s0, [$d7];
};

An fbarrier that is reached in the loop and "wave id 3" path should make the case a well-defined one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant