-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy build log and artifacts to a permanent location after failures #4601
base: develop
Are you sure you want to change the base?
Conversation
b306ccd
to
092fcd0
Compare
I am not sure what is the best way to select the build directory so that I can move it to a more permanent location. That is at the moment I am recreating the location of the build path and then copy the directory to the destination path:
|
50d99c3
to
86fe081
Compare
@gkaf89 The For, for example, for So, I think you could create a subdirectory in the permanent storage location that uses the name of the easyconfig file (to keep it simple), and copy the contents of |
Yeah, the thing to copy should be |
1274a2b
to
b1a9da8
Compare
@gkaf89 If you need any help with this, do let us know! |
b1a9da8
to
6bc53e6
Compare
@boegel The commit is ready. I won't have enough time to familiarize myself with the test framework for the The commit can be tested by modifying the configuration options of some easyconfig that uses the system toolchain to cause a failure. For instance I added the option
in |
@gkaf89 There's a problem with the tests, looks like |
The failure is caused because the target location for permanent storage is the same as the source location. The steps I am following to resolve the issue:
|
b34ff11
to
5cfdfd1
Compare
@boegel Some edge cases where uncovered by the tests. The latest commit resolves the issue. I leave it up you if you prefer to move it to version 5. I am not familiar enough with the tests to test the PR extensively. |
@gkaf89 As briefly discussed during the conf call today, it would be good if you could add a test (or enhance an existing one, like Do let us know if you need any help with that! |
e7437a1
to
65f0294
Compare
a299a49
to
0d61be8
Compare
0d61be8
to
97fc225
Compare
Some tests fail because a warning is printed:
I want the compilation to fail, is there a way to silence the warning? |
8ea0754
to
70cefe2
Compare
Using |
0eb8e61
to
3e2421e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some suggestions inline
This is a path where file from the build_log_path can be stored permanently | ||
:param ec: dict-like value that provides values for %(name)s and %(version)s template values | ||
""" | ||
error_log_path = ConfigurationVariables()['errorlogpath'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a good idea to have a variable with the same name as a function. Maybe rename the function to get_....
or default_...
?
""" | ||
Return the default error log path | ||
|
||
This is a path where file from the build_log_path can be stored permanently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a path where file from the build_log_path can be stored permanently | |
This is a path where files from the build_log_path can be stored permanently |
if ec is None: | ||
ec = {} | ||
|
||
name, version = ec.get('name', '%(name)s'), ec.get('version', '%(version)s') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you want %(name)s
in the folder name?
Return the default error log path | ||
|
||
This is a path where file from the build_log_path can be stored permanently | ||
:param ec: dict-like value that provides values for %(name)s and %(version)s template values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't match what is used. It expects name
and version
keys, not templates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the %(name)s
and %(version)s
template strings as default values. I have seen this use in other cases, log_file_format
function on the same file (they indeed appear in file paths). It seems like a nice way to inform the user about a missing value without a hard failure.
Should we avoid using template names as default values or explain their use in a better way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Isn't trivial to understand for me what is meant by that. I'd change the documentation to
:param ec: dict-like value with at least the keys 'name' and 'version'
I would not use a default and just fail hard if the keys do not exist because that is what is documented: The function expects a dict with those keys. If you don't provide one it is the fault of the caller. If we do it right in easybuild the keys will always exist, so no need for fallbacks. Or am I missing anything?
|
||
path = base_path | ||
inc_no = 1 | ||
while os.path.exists(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rather use filetools.create_unused_dir
? This avoids duplicating the logic and a possible conflict when another eb instance runs in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need just the directory name to copy a directory to a new location, not to create a directory. The filetools.create_unused_dir
couples closely the 2 functions, creating the directory name and creating the directory.
I will try to decouple the 2 functions and extract a function for creating a file name in a different commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That "coupling" is intentional: If you don't create the directory you have a race condition for process running in parallel defeating the purpose of this function. Why don't you want that directory created? Can't you just fill it later using the created directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You couple through the file system, got it.
@@ -228,6 +233,9 @@ def test_toy_build(self, extra_args=None, ec_file=None, tmpdir=None, verify=True | |||
msg = "Pattern %s found in full test report: %s" % (regex.pattern, test_report_txt) | |||
self.assertTrue(regex.search(test_report_txt), msg) | |||
|
|||
if check_errorlog is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this better done outside? Doesn't need to be in this function, does it?
@@ -514,3 +516,15 @@ def find_full_path(base_path, trim=(lambda x: x)): | |||
break | |||
|
|||
return full_path | |||
|
|||
|
|||
class TempDirectory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already is tempfile.TemporaryDirectory
which additionally allows using in a context manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice catch! Using context manager.
@@ -0,0 +1,33 @@ | |||
name = 'toy' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we usually do instead of adding yet another test easyconfig is read_file
it and write with modifications. See self.contents
and self.eb_file
…failures The files can be build in some selected build path (--buildpath), and the logs of successful compilation are then concentrated to some other location for permanent storage (--logfile-format). Logs of failed builds remain in the build path location so that they can be inspected. However, this setup is problematic when building software in HPC jobs. Quite often in HPC systems the build path is set to some fast storage local to the node, like NVME raid mounted on `/tmp` or `/dev/shm` (as suggested in the documentation: https://docs.easybuild.io/configuration/#buildpath). The node storage is often wiped out after the end of a job, so the log files and the artifacts are no longer available after the termination of the job. This commit adds an option (--errorlogpath)to accumulate errors in some more permanent location, so that the can be easily inspected after a failed build.
Create tests for: - the `errorlogpath` option.
- There does not seem to be a field storing the path to the builddir of an easyblock relative to the base build path. In this refactored version the relative builddir is extracted from the full path and the base build path using the `os.path.relpath` function. - During the copying of the files, the operation may fail, for instance due to the lack of space in the target location or insufficient rights. Report the copying of the artifacts after the copy operations.
The function moves logs and artifacts of failed build in a special location for permanent storage.
The base builddir path is used to construct the builddir by - pre-pending the asboloute build path, and - adding a numerical suffix to ensure uniqueness.
The log messages mention both the temporary log file created in the build directory, and the path where the file is copied for permanent storage. This commits makes a distinction between the two path in the log messages.
- In testing multiple failures can occur in quick succession resulting in the same time stamp, and as a result in the same base error log path. Extent the path stamp with an increasing number (naive O(n^2) algorithm used at the moment, should be sufficient). - In case the user provides the same error log path as the build directory log path, add a check to prevent copying the files to prevent errors in the copying functions.
The toy test file is modified with a patch to fail during compilation. The tests verify that: - the source directory is copied to the error log path, - the log files are copied to the error log path, and - a warning for the compilation failure is reported in stdout.
0c5bf04
to
fe16762
Compare
The files can be build in some selected build path (--buildpath), and the logs of successful compilation are then concentrated to some other location for permanent storage (--logfile-format). Logs of failed builds remain in the build path location so that they can be inspected.
However, this setup is problematic when building software in HPC jobs. Quite often in HPC systems the build path is set to some fast storage local to the node, like NVME raid mounted on
/tmp
or/dev/shm
(as suggested in the documentation: https://docs.easybuild.io/configuration/#buildpath). The node storage is often wiped out after the end of a job, so the log files and the artifacts are no longer available after the termination of the job.This commit adds an option to accumulate errors in some more permanent location, so they can be easily inspected after a failed build.