Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H5Fget_access_plist does not return a valid faplid #15

Open
yzanhua opened this issue Sep 16, 2022 · 7 comments
Open

H5Fget_access_plist does not return a valid faplid #15

yzanhua opened this issue Sep 16, 2022 · 7 comments

Comments

@yzanhua
Copy link

yzanhua commented Sep 16, 2022

Summary

When using Cache Vol and Async Vol, it seems that H5Fget_access_plist does not return a valid faplid.
The returned id is non-negative but seems not a property list.

Error Details

% echo $HDF5_VOL_CONNECTOR 
cache_ext config=cache_1.cfg;under_vol=512;under_info={under_vol=0;under_info={}}

% mpirun -n 1 ./test
HDF5-DIAG: Error detected in HDF5 (1.13.3-1) MPI-process 0:
  #000: ../../hdf5-dev/src/H5Pfapl.c line 1487 in H5Pget_driver(): can't get driver
    major: Property lists
    minor: Can't get value
  #001: ../../hdf5-dev/src/H5Pfapl.c line 1444 in H5P_peek_driver(): not a file access property list
    major: Property lists
    minor: Inappropriate type
  #002: ../../hdf5-dev/src/H5Pint.c line 4067 in H5P_isa_class(): not a property list
    major: Invalid arguments to routine
    minor: Inappropriate type

Test Program

Click here to see the test program:
#include <hdf5.h>
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

#define N 10

#define CHECK_ERR(A) {if (A < 0) { printf("Error at line %d: code %d\n", __LINE__, A); }}

int main(int argc, char **argv) {
  herr_t err = 0;

  int mpi_required;
  const char *file_name = "test.h5";

  hid_t fid = -1;          // File ID
  hid_t faplid = -1;       // File Access Property List
  hid_t plist_id = -1;
  hid_t faplid2 = -1;

  // init MPI
  err = MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &mpi_required);
  CHECK_ERR(err);

  // create file
  faplid = H5Pcreate(H5P_FILE_ACCESS);
  CHECK_ERR(faplid);
  H5Pset_fapl_mpio(faplid, MPI_COMM_WORLD, MPI_INFO_NULL);
  fid = H5Fcreate(file_name, H5F_ACC_TRUNC, H5P_DEFAULT, faplid);
  CHECK_ERR(fid);

  // get faplid
  faplid2 = H5Fget_access_plist (fid);
  CHECK_ERR (faplid2);
  plist_id = H5Pget_driver (faplid2);  // Error occurs here

  if (fid >= 0)
    H5Fclose(fid);
  if (faplid >= 0)
    H5Pclose(faplid);

  MPI_Finalize();

  return 0;
}

Libraries Versions (commit number)

Click here to see the details
  1. HDF5 develop branch: HDFGroup/hdf5@b559857
  2. Argobots main branch: pmodels/argobots@dce6e72
  3. AsyncVol develop branch: HDFGroup/vol-async@0a92d23
  4. Cache Vol develop branch: f453900
@wkliao
Copy link
Contributor

wkliao commented Jan 2, 2023

Will this issue be addressed soon?

@zhenghh04
Copy link
Collaborator

Hi @wkliao @yzanhua, this is an issue of the HDF5 library. I encountered this when I was running E3SM-IO. I have to comment out H5Fget_access_plist in the code to make it running. I mentioned it to Neil before. Maybe report this to HDF5?

@wkliao
Copy link
Contributor

wkliao commented Jan 3, 2023

I am not sure whether this is HDF5's issue.
@yzanhua testes the small program he provided in this PR using the followings.
It failed only when using Cache+Async VOLs.

Cache+Async VOL: fail
Cache VOL only: success
Passthrough VOL only: success
Log VOL only: success

using:
HDF5: 1.13.3,
Cache VOL: master branch
Async VOL: v1.4

@yzanhua
Copy link
Author

yzanhua commented Jan 4, 2023

It also fails when using Async only. It seems like Async VOL (instead of Cache VOL) is not handling faplid correctly.

@zhenghh04
Copy link
Collaborator

Yes, it is with Async + HDF5.
@houjun , did you encounter this issue before?

@houjun
Copy link
Collaborator

houjun commented Jan 4, 2023

Yes, I remember it is related to future ID when async is used, I'll take another look and check with HDF people.

@yzanhua
Copy link
Author

yzanhua commented Jan 4, 2023

The provided test program failed in H5Pget_driver (the line where the invalid faplid2 is first used). I also tested using other H5Pget_xxxxs to replace H5Pget_driver to see if the program still fails. The results might be helpful to debugging.

H5Pget_driver_info, H5Pget_fapl_mpio andH5Pget_fapl_core fail with the same error messages, complaining about "not a property list".

However, using H5Pget_fclose_degree and H5Pget_evict_on_close can run without a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants