BUG: `assign` of `subfunctions` do not work as expected before op2.par_loop when run in parallel #3150

lrtfm · 2023-10-11T09:21:24Z

Describe the bug
The data of functions from MixedFunctionSpace seems not update properly when using assign for its subfunctions and run in parallel.

Steps to Reproduce
Consider the following code:

# debug_subfunctions.py
#
# Date: 2020-10-15
from firedrake import *
from firedrake.petsc import PETSc


def make_matrix(mesh):
    '''make a matrix according to the normal vertor of each nodes in the verctor space

    Give one scalar function space V1, one vector function space V2, and normal vector n defined on V2,
    return a matrix M in the mixed space V = V1*V2, such that the block V1*V1 is identity matrix,
    and the block V2*V2 looks like 

        0 0 0
        0 0 0
        a b c
             ...
             ...
             ...
                 0 0 0
                 0 0 0
                 d e f

    i.e., the third row of the block to each node is assigned with the values of the normal vector n on this node.

    '''

    V1 = FunctionSpace(mesh, "CG", 1)
    V2 = VectorFunctionSpace(mesh, "CG", 2)
    V = V1*V2

    # make a template matrix
    u, v = TrialFunction(V), TestFunction(V)
    a = inner(u[0], v[0])*dx + inner(u[1], v[1])*dx
    T = assemble(a).M

    # n[1] = [x[0], x[1], x[2]] be the normal vector
    n = Function(V)
    x = SpatialCoordinate(mesh)
    _n = Function(V2).interpolate(x/sqrt(x**2))

    # If we use assign or just copy the data, i.e., 
    #    n.subfunctions[1].assign(_n)
    # or
    #    n.subfunctions[1].dat.data[:] = _n.dat.data[:]
    # when run in parallel, the result is not correct.
    # For example, when run with `mpiexec -n 2`,
    # the last several rows are all zeros on the second process.
    # But interpolate works well
    #     n.subfunctions[1].interpolate(_n)
    n.subfunctions[1].assign(_n)

    kernel = op2.Kernel('''
    void set_matrix(double M[{N}][{N}], double *n){{
        int s = 0;
        for(int i = 0; i < 3; ++i){{
            M[i][i] = 1;
        }}
        for(int i = 3; i < {N}; ++i){{
            s = (i - 3)%3;
            if (s == 2) {{
                for (int j = 0; j < 3; j++) {{
                    M[i][i-s+j] = n[i-s+j];
                }}
            }}
        }}
    }}
    '''.format(N=21), 'set_matrix')

    M = op2.Mat(T.sparsity)
    maps = T.sparsity.maps[0][0]
    op2.par_loop(kernel, maps.iterset, M(op2.WRITE, (maps, maps)), n.dat(op2.READ, maps))
    M.assemble()

    return M


def print_row_l2_norm(M):
    '''Print the l2 norm of each row of M'''

    rank, size = COMM_WORLD.rank, COMM_WORLD.size
    s, e = M.handle.getOwnershipRange()
    ret = np.zeros(e - s)
    for i in range(s, e):
        row = M.handle.getRow(i)
        ret[i - s] = sqrt(sum(row[1]**2))

    PETSc.Sys.Print('l2 norm of each row of B: sqrt(sum(row[i])^2)')
    PETSc.Sys.syncPrint('  rank %d:' % rank)
    n_per_line = 10
    for i in range(((e-s) + n_per_line - 1)//n_per_line):
        _s, _e = i*n_per_line, min((i+1)*n_per_line, e-s)
        PETSc.Sys.syncPrint(f'    {s + _s:3d} - {s + _e:3d}', ret[_s:_e])
    PETSc.Sys.syncFlush()


mesh = IcosahedralSphereMesh(radius=1.0)
M = make_matrix(mesh)
print_row_l2_norm(M)

The output of command

mpiexec -n 2 python debug_subfunctions.py

is

l2 norm of each row of B: sqrt(sum(row[i])^2)
  rank 0:
      0 -  10 [1. 1. 1. 0. 0. 1. 0. 0. 1. 0.]
     10 -  20 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     20 -  30 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]
     30 -  40 [0. 0. 1. 0. 0. 1. 0. 0. 1. 0.]
     40 -  48 [0. 1. 0. 0. 1. 0. 0. 1.]
  rank 1:
     48 -  58 [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.]
     58 -  68 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     68 -  78 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]
     78 -  88 [0. 0. 1. 0. 0. 1. 0. 0. 1. 0.]
     88 -  98 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     98 - 108 [1. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
    108 - 118 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
    118 - 128 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
    128 - 138 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Expected behavior
The output should be (which is the output by replacing assign by interpolate)

l2 norm of each row of B: sqrt(sum(row[i])^2)
  rank 0:
      0 -  10 [1. 1. 1. 0. 0. 1. 0. 0. 1. 0.]
     10 -  20 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     20 -  30 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]
     30 -  40 [0. 0. 1. 0. 0. 1. 0. 0. 1. 0.]
     40 -  48 [0. 1. 0. 0. 1. 0. 0. 1.]
  rank 1:
     48 -  58 [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.]
     58 -  68 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     68 -  78 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]
     78 -  88 [0. 0. 1. 0. 0. 1. 0. 0. 1. 0.]
     88 -  98 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
     98 - 108 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]
    108 - 118 [0. 0. 1. 0. 0. 1. 0. 0. 1. 0.]
    118 - 128 [0. 1. 0. 0. 1. 0. 0. 1. 0. 0.]
    128 - 138 [1. 0. 0. 1. 0. 0. 1. 0. 0. 1.]

Environment:

OS: MacOS
Python version: Python 3.11.6
Output of firedrake-status

Firedrake Configuration:
   package_manager: True
   minimal_petsc: False
   mpicc: None
   mpicxx: None
   mpif90: None
   mpiexec: None
   disable_ssh: True
   honour_petsc_dir: False
   with_parmetis: False
   slepc: False
   packages: []
   honour_pythonpath: False
   opencascade: False
   tinyasm: False
   torch: False
   petsc_int_type: int32
   cache_dir: /Users/zzyang/opt/firedrake/firedrake-real-int32-debug/.cache
   complex: False
   remove_build_files: False
   with_blas: download
   netgen: False
Additions:
   None
Environment:
   PYTHONPATH: None
   PETSC_ARCH: None
   PETSC_DIR: None
Status of components:
---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|FInAT               |master                        |f493084   |False     |
|PyOP2               |master                        |da14715c  |False     |
|fdutils             |main                          |83201c3   |False     |
|fiat                |master                        |e440a06   |False     |
|firedrake           |master                        |ad774b883 |True      |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |dbe226b   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |37b8d52976|False     |
|pyadjoint           |master                        |cefba3f   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |master                        |47c1324   |False     |
|ufl                 |master                        |cf66c4b3  |False     |
---------------------------------------------------------------------------

The text was updated successfully, but these errors were encountered:

dham · 2023-10-11T09:50:46Z

What does n.dat.data_ro look like in each case?

lrtfm · 2023-10-11T10:10:39Z

The output of print(n.dat.data_ro) are same

(array([0., 0., 0., 0., 0., 0., 0., 0., 0.]), array([[ 0.30901699, -0.80901699,  0.5       ],
       [-0.30901699, -0.80901699,  0.5       ],
       [ 0.        , -0.52573111,  0.85065081],
       [ 0.5       , -0.30901699,  0.80901699],
       [-0.5       , -0.30901699,  0.80901699],
       [-0.80901699, -0.5       ,  0.30901699],
       [-0.85065081,  0.        ,  0.52573111],
       [ 0.5       ,  0.30901699,  0.80901699],
       [ 0.        ,  0.        ,  1.        ],
       [ 0.        ,  0.52573111,  0.85065081],
       [-0.5       ,  0.30901699,  0.80901699],
       [-1.        ,  0.        ,  0.        ],
       [ 0.30901699,  0.80901699,  0.5       ],
       [-0.80901699,  0.5       ,  0.30901699],
       [-0.30901699,  0.80901699,  0.5       ],
       [ 0.        , -1.        ,  0.        ],
       [ 0.52573111, -0.85065081,  0.        ],
       [-0.52573111, -0.85065081,  0.        ],
       [ 0.80901699, -0.5       ,  0.30901699],
       [ 0.85065081,  0.        ,  0.52573111],
       [-0.80901699, -0.5       , -0.30901699],
       [-0.85065081,  0.        , -0.52573111],
       [ 0.80901699,  0.5       ,  0.30901699],
       [ 0.52573111,  0.85065081,  0.        ],
       [-0.52573111,  0.85065081,  0.        ],
       [-0.80901699,  0.5       , -0.30901699],
       [ 0.        ,  1.        ,  0.        ]]))
(array([0., 0., 0.]), array([[ 0.30901699,  0.80901699, -0.5       ],
       [-0.30901699,  0.80901699, -0.5       ],
       [ 0.        ,  0.52573111, -0.85065081],
       [-0.5       ,  0.30901699, -0.80901699],
       [ 0.80901699,  0.5       , -0.30901699],
       [ 0.5       ,  0.30901699, -0.80901699],
       [ 0.85065081,  0.        , -0.52573111],
       [ 0.5       , -0.30901699, -0.80901699],
       [ 0.        ,  0.        , -1.        ],
       [ 0.        , -0.52573111, -0.85065081],
       [-0.5       , -0.30901699, -0.80901699],
       [ 1.        ,  0.        ,  0.        ],
       [ 0.30901699, -0.80901699, -0.5       ],
       [ 0.80901699, -0.5       , -0.30901699],
       [-0.30901699, -0.80901699, -0.5       ]]))

connorjward · 2023-10-11T10:47:45Z

What about n.subfunctions[1].dat.halo_valid in both cases? This could be an issue with some of the tricks I used in assign to avoid halo exchanges. I may not have given full enough consideration to halo validity for mixed dats.

ksagiyam · 2023-10-11T13:21:17Z

I think this feature in undertested as Firedrake assembles block-by-block. For instance, do we not need MixedDatKernelArg here https://github.com/OP2/PyOP2/blob/da14715cca2d2174d6f5554444e84f1a5d92cdf9/pyop2/parloop.py#L317 to have halo exchanges happen?

lrtfm · 2023-10-11T13:25:03Z

What about n.subfunctions[1].dat.halo_valid in both cases? This could be an issue with some of the tricks I used in assign to avoid halo exchanges. I may not have given full enough consideration to halo validity for mixed dats.

The values are all False in both cases.

connorjward · 2023-10-16T08:57:49Z

I think this feature in undertested as Firedrake assembles block-by-block. For instance, do we not need MixedDatKernelArg here https://github.com/OP2/PyOP2/blob/da14715cca2d2174d6f5554444e84f1a5d92cdf9/pyop2/parloop.py#L317 to have halo exchanges happen?

Great spot! This does look like it could be responsible.

connorjward · 2023-10-16T09:11:03Z

I think this feature in undertested as Firedrake assembles block-by-block. For instance, do we not need MixedDatKernelArg here https://github.com/OP2/PyOP2/blob/da14715cca2d2174d6f5554444e84f1a5d92cdf9/pyop2/parloop.py#L317 to have halo exchanges happen?

Great spot! This does look like it could be responsible.

Turns out that this is a little trickier to change than I thought it would be. Would it be OK to mark this issue as "wontfix"? The issue appears to stem from the fact that MixedDat and Dat have slightly different implementations and in pyop3 I make no distinction between the two, resulting in a single code path, so the issue should go away.

On second thought, I don't think that would be the right thing to do. We should at least find out why this is happening (and perhaps submit an xfailing test).

connorjward · 2023-10-17T14:02:01Z

This is fixed by OP2/PyOP2#710

lrtfm added the bug label Oct 11, 2023

This was referenced Oct 17, 2023

Fix halo exchanges for MixedDats in parloops OP2/PyOP2#710

Merged

Add test for MixedDat halo exchanges in a parloop #3168

Merged

JDBetteridge closed this as completed in OP2/PyOP2#710 Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: `assign` of `subfunctions` do not work as expected before op2.par_loop when run in parallel #3150

BUG: `assign` of `subfunctions` do not work as expected before op2.par_loop when run in parallel #3150

lrtfm commented Oct 11, 2023 •

edited

Loading

dham commented Oct 11, 2023

lrtfm commented Oct 11, 2023 •

edited

Loading

connorjward commented Oct 11, 2023 •

edited

Loading

ksagiyam commented Oct 11, 2023 •

edited

Loading

lrtfm commented Oct 11, 2023

connorjward commented Oct 16, 2023

connorjward commented Oct 16, 2023 •

edited

Loading

connorjward commented Oct 17, 2023

BUG: assign of subfunctions do not work as expected before op2.par_loop when run in parallel #3150

BUG: assign of subfunctions do not work as expected before op2.par_loop when run in parallel #3150

Comments

lrtfm commented Oct 11, 2023 • edited Loading

dham commented Oct 11, 2023

lrtfm commented Oct 11, 2023 • edited Loading

connorjward commented Oct 11, 2023 • edited Loading

ksagiyam commented Oct 11, 2023 • edited Loading

lrtfm commented Oct 11, 2023

connorjward commented Oct 16, 2023

connorjward commented Oct 16, 2023 • edited Loading

connorjward commented Oct 17, 2023

BUG: `assign` of `subfunctions` do not work as expected before op2.par_loop when run in parallel #3150

BUG: `assign` of `subfunctions` do not work as expected before op2.par_loop when run in parallel #3150

lrtfm commented Oct 11, 2023 •

edited

Loading

lrtfm commented Oct 11, 2023 •

edited

Loading

connorjward commented Oct 11, 2023 •

edited

Loading

ksagiyam commented Oct 11, 2023 •

edited

Loading

connorjward commented Oct 16, 2023 •

edited

Loading