Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New ptest failure in tst_rec_vars #72

Closed
ArchangeGabriel opened this issue Jul 21, 2021 · 5 comments
Closed

New ptest failure in tst_rec_vars #72

ArchangeGabriel opened this issue Jul 21, 2021 · 5 comments

Comments

@ArchangeGabriel
Copy link

Short log:

*** TESTING C   tst_rec_vars for record variables to NetCDF4 file  ------ Error at line 78 of tst_rec_vars.c: expect dim X len 4 but got 3
Error at line 78 of tst_rec_vars.c: expect dim X len 4 but got 3
fail with 2 mismatches

Most important changes is upgrading netcdf to 4.8.0, but I need to retry again 4.7.4 to verify that this is effectively the culprit.

@wkliao
Copy link
Member

wkliao commented Jul 21, 2021

I believe this is caused by a NetCDF 4.8.0 bug. Below is the same program but
rewritten in NetCDF that can reproduce the same error if compiled with 4.8.0.
Note the variable is created and set with collective access mode, assuming
that mode should guarantee the metadata consistency across all MPI processes.

% cat tst_nc4.c
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <netcdf.h>
#include <netcdf_par.h>

#define CHECK_ERR { \
    if (err != NC_NOERR) { \
        printf("Error at line=%d: %s\n", __LINE__, nc_strerror(err)); \
        return 0; \
    } \
}

int main(int argc, char** argv) {
    int err=NC_NOERR, rank, nprocs;
    int ncid, cmode, varid, dimid, buf;
    size_t start[1], count[1], nrecs;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    buf = rank;

    if (rank == 0)
        printf("\nNetCDF library version is: %s\n\n", nc_inq_libvers());

    cmode = NC_CLOBBER | NC_NETCDF4;
    err = nc_create_par("testfile.nc", cmode, MPI_COMM_WORLD, MPI_INFO_NULL,
                        &ncid); CHECK_ERR

    err = nc_def_dim(ncid, "time", NC_UNLIMITED, &dimid); CHECK_ERR
    err = nc_def_var(ncid, "var", NC_INT, 1, &dimid, &varid); CHECK_ERR
    err = nc_var_par_access(ncid, varid, NC_COLLECTIVE); CHECK_ERR
    err = nc_enddef(ncid); CHECK_ERR

    start[0] = rank;
    count[0] = 1;
    err = nc_put_vara_int(ncid, varid, start, count, &buf); CHECK_ERR
    err = nc_inq_dimlen(ncid, dimid, &nrecs); CHECK_ERR

    if (nrecs != nprocs) {
        printf("Rank %d error at line %d of file %s:\n",rank,__LINE__,__FILE__);
        printf("\tafter writing start=%zd count=%zd\n", start[0], count[0]);
        printf("\texpecting number of records = %lld but got %lld\n",
               nprocs, nrecs);
    err = nc_close(ncid); CHECK_ERR

err_out:
    MPI_Finalize();
    return (err < 0);
}

Compile and run commands I used.

% mpicc -g -O0 tst_nc4.c -o tst_nc4 \
      -I $HOME/NetCDF/4.8.0/include \
      -L $HOME/NetCDF/4.8.0/lib -lnetcdf

% mpiexec -n 4 ./tst_nc4

NetCDF library version is: 4.8.1 of Oct  4 2021 16:24:17 $

Rank 0 error at line 40 of file tst_nc4.c:
	after writing start=0 count=1
	expecting number of records = 4 but got 1
Rank 1 error at line 40 of file tst_nc4.c:
	after writing start=1 count=1
	expecting number of records = 4 but got 2
Rank 2 error at line 40 of file tst_nc4.c:
	after writing start=2 count=1
	expecting number of records = 4 but got 3

@ArchangeGabriel
Copy link
Author

Opened Unidata/netcdf-c#2038 to get the netCDF devs opinion on this. ;)

@wkliao
Copy link
Member

wkliao commented Sep 12, 2021

I suggest to not enable netcdf4 feature when building PnetCDF for ArchLinux release.

archlinux-github pushed a commit to archlinux/svntogit-community that referenced this issue Oct 1, 2021
See Parallel-NetCDF/PnetCDF#72

git-svn-id: file:///srv/repos/svn-community/svn@1026599 9fca08f4-af9d-4005-b8df-a31f2cc04f65
archlinux-github pushed a commit to archlinux/svntogit-community that referenced this issue Oct 1, 2021
See Parallel-NetCDF/PnetCDF#72


git-svn-id: file:///srv/repos/svn-community/svn@1026599 9fca08f4-af9d-4005-b8df-a31f2cc04f65
@ArchangeGabriel
Copy link
Author

As GitHub auto-linked, that’s what I’ve done for now. ;)

wkliao added a commit that referenced this issue Feb 20, 2022
wkliao added a commit that referenced this issue Feb 20, 2022
These two NetCDF versions failed tst_rec_vars.c when run in parallel.
See issue #72
@ArchangeGabriel
Copy link
Author

Fixed in NetCDF 4.9. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants