Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting failure to stat directory for stdout #5

Open
toomanycats opened this issue Aug 19, 2024 · 3 comments
Open

Getting failure to stat directory for stdout #5

toomanycats opened this issue Aug 19, 2024 · 3 comments

Comments

@toomanycats
Copy link

I'm tracking down a very obscure error, where about 30% of submitted jobs, go into the Eqw state.
The error is always the same,

error reason          1:      08/19/2024 14:10:31 [1730373583:9092]: can't stat() "/grid_test" as stdout_path: Permission denied KRB5CCNAME=none uid=xxx gid=xxx 101 600  xxx  xxx xxx

We thought this was due to using a brand new storage appliance. However, when permissions are get wide open there's no change in the behavior. I've captured NFS traffic and been analyzing it in Wireshark. I don't see any FSSTAT failling.

I'm wondering, if the SGE daemon creates the stdout and stderr file in the sge root directory and the client then copies it out ??

Any ideas are appreciated.

@grisu48
Copy link
Owner

grisu48 commented Aug 20, 2024

At first this could be an error caused by a MAC solution. Can you check if AppArmor or SELinux could be the culprit, i.e. disabling either one of those and seeing if the error disappears.

@toomanycats
Copy link
Author

That's a good idea but it didn't help. I set selinux into permissive mode, rebooted and received the same error.
This new storage is a cluster so I was hoping that might work.

What do you think about this function: sge_filecmp in source/libs/uti/sge_io.c line 166.

/****** uti/io/sge_filecmp() **************************************************
  1 *  NAME
  2 *     sge_filecmp() -- Compare two files
  3 *
  4 *  SYNOPSIS
  5 *     int sge_filecmp(const char *name0, const char *name1)
  6 *
  7 *  FUNCTION
  8 *     Compare two files. They are equal if:
  9 *        - both of them have the same name
 10 *        - if a stat() succeeds for both files and
 11 *          i-node/device-id are equal

@grisu48
Copy link
Owner

grisu48 commented Aug 21, 2024

Not sure, but given that the error message says explicitly Permission denied I would assume the error is somewhere in the file system permissions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants