Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: assign of subfunctions do not work as expected before op2.par_loop when run in parallel #3150

Closed
lrtfm opened this issue Oct 11, 2023 · 8 comments · Fixed by OP2/PyOP2#710
Labels

Comments

@lrtfm
Copy link
Contributor

lrtfm commented Oct 11, 2023

Describe the bug
The data of functions from MixedFunctionSpace seems not update properly when using assign for its subfunctions and run in parallel.

Steps to Reproduce
Consider the following code:

# debug_subfunctions.py
#
# Date: 2020-10-15
from firedrake import *
from firedrake.petsc import PETSc


def make_matrix(mesh):
    '''make a matrix according to the normal vertor of each nodes in the verctor space

    Give one scalar function space V1, one vector function space V2, and normal vector n defined on V2,
    return a matrix M in the mixed space V = V1*V2, such that the block V1*V1 is identity matrix,
    and the block V2*V2 looks like 

        0 0 0
        0 0 0
        a b c
             ...
             ...
             ...
                 0 0 0
                 0 0 0
                 d e f

    i.e., the third row of the block to each node is assigned with the values of the normal vector n on this node.

    '''

    V1 = FunctionSpace(mesh, "CG", 1)
    V2 = VectorFunctionSpace(mesh, "CG", 2)
    V = V1*V2

    # make a template matrix
    u, v = TrialFunction(V), TestFunction(V)
    a = inner(u[0], v[0])*dx + inner(u[1], v[1])*dx
    T = assemble(a).M

    # n[1] = [x[0], x[1], x[2]] be the normal vector
    n = Function(V)
    x = SpatialCoordinate(mesh)
    _n = Function(V2).interpolate(x/sqrt(x**2))

    # If we use assign or just copy the data, i.e., 
    #    n.subfunctions[1].assign(_n)
    # or
    #    n.subfunctions[1].dat.data[:] = _n.dat.data[:]
    # when run in parallel, the result is not correct.
    # For example, when run with `mpiexec -n 2`,
    # the last several rows are all zeros on the second process.
    # But interpolate works well
    #     n.subfunctions[1].interpolate(_n)
    n.subfunctions[1].assign(_n)

    kernel = op2.Kernel('''
    void set_matrix(double M[{N}][{N}], double *n){{
        int s = 0;
        for(int i = 0; i < 3; ++i){{
            M[i][i] = 1;
        }}
        for(int i = 3; i < {N}; ++i){{
            s = (i - 3)%3;
            if (s == 2) {{
                for (int j = 0; j < 3; j++) {{
                    M[i][i-s+j] = n[i-s+j];
                }}
            }}
        }}
    }}
    '''.format(N=21), 'set_matrix')

    M = op2.Mat(T.sparsity)
    maps = T.sparsity.maps[0][0]
    op2.par_loop(kernel, maps.iterset, M(op2.WRITE, (maps, maps)), n.dat(op2.READ, maps))
    M.assemble()

    return M


def print_row_l2_norm(M):
    '''Print the l2 norm of each row of M'''

    rank, size = COMM_WORLD.rank, COMM_WORLD.size
    s, e = M.handle.getOwnershipRange()
    ret = np.zeros(e - s)
    for i in range(s, e):
        row = M.handle.getRow(i)
        ret[i - s] = sqrt(sum(row[1]**2))

    PETSc.Sys.Print('l2 norm of each row of B: sqrt(sum(row[i])^2)')
    PETSc.Sys.syncPrint('  rank %d:' % rank)
    n_per_line = 10
    for i in range(((e-s) + n_per_line - 1)//n_per_line):
        _s, _e = i*n_per_line, min((i+1)*n_per_line, e-s)
        PETSc.Sys.syncPrint(f'    {s + _s:3d} - {s + _e:3d}', ret[_s:_e])
    PETSc.Sys.syncFlush()


mesh = IcosahedralSphereMesh(radius=1.0)
M = make_matrix(mesh)
print_row_l2_norm(M)

The output of command

mpiexec -n 2 python debug_subfunctions.py

is

l2 norm of each row of B: sqrt(sum(row[i])^2)
  rank 0:
      0 -  10 [1. 1. 1. 0. 0. 1. 0. 0. 1. 0.]
     10 -  20 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     20 -  30 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]
     30 -  40 [0. 0. 1. 0. 0. 1. 0. 0. 1. 0.]
     40 -  48 [0. 1. 0. 0. 1. 0. 0. 1.]
  rank 1:
     48 -  58 [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.]
     58 -  68 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     68 -  78 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]
     78 -  88 [0. 0. 1. 0. 0. 1. 0. 0. 1. 0.]
     88 -  98 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     98 - 108 [1. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
    108 - 118 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
    118 - 128 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
    128 - 138 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Expected behavior
The output should be (which is the output by replacing assign by interpolate)

l2 norm of each row of B: sqrt(sum(row[i])^2)
  rank 0:
      0 -  10 [1. 1. 1. 0. 0. 1. 0. 0. 1. 0.]
     10 -  20 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     20 -  30 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]
     30 -  40 [0. 0. 1. 0. 0. 1. 0. 0. 1. 0.]
     40 -  48 [0. 1. 0. 0. 1. 0. 0. 1.]
  rank 1:
     48 -  58 [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.]
     58 -  68 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     68 -  78 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]
     78 -  88 [0. 0. 1. 0. 0. 1. 0. 0. 1. 0.]
     88 -  98 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     98 - 108 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]
    108 - 118 [0. 0. 1. 0. 0. 1. 0. 0. 1. 0.]
    118 - 128 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
    128 - 138 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]

Environment:

  • OS: MacOS
  • Python version: Python 3.11.6
  • Output of firedrake-status
Firedrake Configuration:
   package_manager: True
   minimal_petsc: False
   mpicc: None
   mpicxx: None
   mpif90: None
   mpiexec: None
   disable_ssh: True
   honour_petsc_dir: False
   with_parmetis: False
   slepc: False
   packages: []
   honour_pythonpath: False
   opencascade: False
   tinyasm: False
   torch: False
   petsc_int_type: int32
   cache_dir: /Users/zzyang/opt/firedrake/firedrake-real-int32-debug/.cache
   complex: False
   remove_build_files: False
   with_blas: download
   netgen: False
Additions:
   None
Environment:
   PYTHONPATH: None
   PETSC_ARCH: None
   PETSC_DIR: None
Status of components:
---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|FInAT               |master                        |f493084   |False     |
|PyOP2               |master                        |da14715c  |False     |
|fdutils             |main                          |83201c3   |False     |
|fiat                |master                        |e440a06   |False     |
|firedrake           |master                        |ad774b883 |True      |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |dbe226b   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |37b8d52976|False     |
|pyadjoint           |master                        |cefba3f   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |master                        |47c1324   |False     |
|ufl                 |master                        |cf66c4b3  |False     |
---------------------------------------------------------------------------
@lrtfm lrtfm added the bug label Oct 11, 2023
@dham
Copy link
Member

dham commented Oct 11, 2023

What does n.dat.data_ro look like in each case?

@lrtfm
Copy link
Contributor Author

lrtfm commented Oct 11, 2023

The output of print(n.dat.data_ro) are same

(array([0., 0., 0., 0., 0., 0., 0., 0., 0.]), array([[ 0.30901699, -0.80901699,  0.5       ],
       [-0.30901699, -0.80901699,  0.5       ],
       [ 0.        , -0.52573111,  0.85065081],
       [ 0.5       , -0.30901699,  0.80901699],
       [-0.5       , -0.30901699,  0.80901699],
       [-0.80901699, -0.5       ,  0.30901699],
       [-0.85065081,  0.        ,  0.52573111],
       [ 0.5       ,  0.30901699,  0.80901699],
       [ 0.        ,  0.        ,  1.        ],
       [ 0.        ,  0.52573111,  0.85065081],
       [-0.5       ,  0.30901699,  0.80901699],
       [-1.        ,  0.        ,  0.        ],
       [ 0.30901699,  0.80901699,  0.5       ],
       [-0.80901699,  0.5       ,  0.30901699],
       [-0.30901699,  0.80901699,  0.5       ],
       [ 0.        , -1.        ,  0.        ],
       [ 0.52573111, -0.85065081,  0.        ],
       [-0.52573111, -0.85065081,  0.        ],
       [ 0.80901699, -0.5       ,  0.30901699],
       [ 0.85065081,  0.        ,  0.52573111],
       [-0.80901699, -0.5       , -0.30901699],
       [-0.85065081,  0.        , -0.52573111],
       [ 0.80901699,  0.5       ,  0.30901699],
       [ 0.52573111,  0.85065081,  0.        ],
       [-0.52573111,  0.85065081,  0.        ],
       [-0.80901699,  0.5       , -0.30901699],
       [ 0.        ,  1.        ,  0.        ]]))
(array([0., 0., 0.]), array([[ 0.30901699,  0.80901699, -0.5       ],
       [-0.30901699,  0.80901699, -0.5       ],
       [ 0.        ,  0.52573111, -0.85065081],
       [-0.5       ,  0.30901699, -0.80901699],
       [ 0.80901699,  0.5       , -0.30901699],
       [ 0.5       ,  0.30901699, -0.80901699],
       [ 0.85065081,  0.        , -0.52573111],
       [ 0.5       , -0.30901699, -0.80901699],
       [ 0.        ,  0.        , -1.        ],
       [ 0.        , -0.52573111, -0.85065081],
       [-0.5       , -0.30901699, -0.80901699],
       [ 1.        ,  0.        ,  0.        ],
       [ 0.30901699, -0.80901699, -0.5       ],
       [ 0.80901699, -0.5       , -0.30901699],
       [-0.30901699, -0.80901699, -0.5       ]]))

@connorjward
Copy link
Contributor

connorjward commented Oct 11, 2023

What about n.subfunctions[1].dat.halo_valid in both cases? This could be an issue with some of the tricks I used in assign to avoid halo exchanges. I may not have given full enough consideration to halo validity for mixed dats.

@ksagiyam
Copy link
Contributor

ksagiyam commented Oct 11, 2023

I think this feature in undertested as Firedrake assembles block-by-block. For instance, do we not need MixedDatKernelArg here https://github.com/OP2/PyOP2/blob/da14715cca2d2174d6f5554444e84f1a5d92cdf9/pyop2/parloop.py#L317 to have halo exchanges happen?

@lrtfm
Copy link
Contributor Author

lrtfm commented Oct 11, 2023

What about n.subfunctions[1].dat.halo_valid in both cases? This could be an issue with some of the tricks I used in assign to avoid halo exchanges. I may not have given full enough consideration to halo validity for mixed dats.

The values are all False in both cases.

@connorjward
Copy link
Contributor

I think this feature in undertested as Firedrake assembles block-by-block. For instance, do we not need MixedDatKernelArg here https://github.com/OP2/PyOP2/blob/da14715cca2d2174d6f5554444e84f1a5d92cdf9/pyop2/parloop.py#L317 to have halo exchanges happen?

Great spot! This does look like it could be responsible.

@connorjward
Copy link
Contributor

connorjward commented Oct 16, 2023

I think this feature in undertested as Firedrake assembles block-by-block. For instance, do we not need MixedDatKernelArg here https://github.com/OP2/PyOP2/blob/da14715cca2d2174d6f5554444e84f1a5d92cdf9/pyop2/parloop.py#L317 to have halo exchanges happen?

Great spot! This does look like it could be responsible.

Turns out that this is a little trickier to change than I thought it would be. Would it be OK to mark this issue as "wontfix"? The issue appears to stem from the fact that MixedDat and Dat have slightly different implementations and in pyop3 I make no distinction between the two, resulting in a single code path, so the issue should go away.

On second thought, I don't think that would be the right thing to do. We should at least find out why this is happening (and perhaps submit an xfailing test).

@connorjward
Copy link
Contributor

This is fixed by OP2/PyOP2#710

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants