Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Darshan is reporting negative values for MPIIO_BYTES_WRITTEN (seen on Frontier) #957

Open
lukebroskop opened this issue Aug 23, 2023 · 0 comments

Comments

@lukebroskop
Copy link

lukebroskop commented Aug 23, 2023

Writing a 4.3TB file to the orion filesystem attached to Frontier. For most of the ranks, Darshan is reporting negative values for MPIIO_BYTES_WRITTEN.
e.g.

MPI-IO	0	7703102304952938401	MPIIO_BYTES_WRITTEN	-32574	/lustre/orion/ven114/scratch/lukebr/TAMM/tensor3d	/lustre/orion	lustre
MPI-IO	1	7703102304952938401	MPIIO_BYTES_WRITTEN	-32726	/lustre/orion/ven114/scratch/lukebr/TAMM/tensor3d	/lustre/orion	lustre
MPI-IO	2	7703102304952938401	MPIIO_BYTES_WRITTEN	-64988	/lustre/orion/ven114/scratch/lukebr/TAMM/tensor3d	/lustre/orion	lustre
MPI-IO	3	7703102304952938401	MPIIO_BYTES_WRITTEN	-32646	/lustre/orion/ven114/scratch/lukebr/TAMM/tensor3d	/lustre/orion	lustre
MPI-IO	4	7703102304952938401	MPIIO_BYTES_WRITTEN	-32494	/lustre/orion/ven114/scratch/lukebr/TAMM/tensor3d	/lustre/orion	lustre
MPI-IO	5	7703102304952938401	MPIIO_BYTES_WRITTEN	-32438	/lustre/orion/ven114/scratch/lukebr/TAMM/tensor3d	/lustre/orion	lustre
MPI-IO	6	7703102304952938401	MPIIO_BYTES_WRITTEN	-65532	/lustre/orion/ven114/scratch/lukebr/TAMM/tensor3d	/lustre/orion	lustre
MPI-IO	8	7703102304952938401	MPIIO_BYTES_WRITTEN	-32766	/lustre/orion/ven114/scratch/lukebr/TAMM/tensor3d	/lustre/orion	lustre

reproducer:

The following build/test works on Frontier

  1. Get code

git clone https://github.com/NWChemEx-Project/TAMM.git
 

  1. setup environment
ml rocm/5.5.1
ml cce/16.0.0
ml cray-mpich/8.1.26
ml cray-libsci/23.05.1.4
ml craype/2.7.21
ml cray-hdf5-parallel/1.12.2.3
ml cmake
  1. Build
cd TAMM
mkdir build && cd build
export TAMM_INSTALL_PATH=<your install path>
CC=cc CXX=CC FC=ftn cmake -DCMAKE_INSTALL_PREFIX=$TAMM_INSTALL_PATH -DUSE_HIP=ON -DROCM_ROOT=$ROCM_PATH -DGPU_ARCH=gfx90a -DGCCROOT=/opt/gcc/12.2.0/snos -DBLAS_INT4=ON -DHDF5_ROOT=$HDF5_ROOT ..
make -j20

The make step should take about 3-4 minutes

  1. Test, from your build directory run:
srun -A <your prodjuct account> -n800 -N50 -qdebug -B 1:7:2 --hint=multithread --ntasks-per-node=16 -t10:00  build/TAMM_Tests_External-prefix/src/TAMM_Tests_External-build/Test_IO 3643

The darshan output is sent to:
/lustre/orion/darshan/frontier/YR/M/D/${USER}_*_id${JOBID}*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant