Copy build log and artifacts to a permanent location after failures #4601

gkaf89 · 2024-08-05T13:10:19Z

The files can be build in some selected build path (--buildpath), and the logs of successful compilation are then concentrated to some other location for permanent storage (--logfile-format). Logs of failed builds remain in the build path location so that they can be inspected.

However, this setup is problematic when building software in HPC jobs. Quite often in HPC systems the build path is set to some fast storage local to the node, like NVME raid mounted on /tmp or /dev/shm (as suggested in the documentation: https://docs.easybuild.io/configuration/#buildpath). The node storage is often wiped out after the end of a job, so the log files and the artifacts are no longer available after the termination of the job.

This commit adds an option to accumulate errors in some more permanent location, so they can be easily inspected after a failed build.

gkaf89 · 2024-08-05T13:31:15Z

I am not sure what is the best way to select the build directory so that I can move it to a more permanent location. That is at the moment I am recreating the location of the build path and then copy the directory to the destination path:

source_build_path = os.path.join(buildpath, name, version, toolchain)
dest_build_path = os.path.join(err_log_path, name, version, toolchain)
copy_dir(source_build_path, dest_build_path)

Is there some variable holding the build path, or even the relative build path (i.e. os.path.join(name, version, toolchain))?
Should we extract this functionality to a module?

boegel · 2024-08-14T09:57:20Z

@gkaf89 The builddir variable that is set in each easyblock instance hold the path to the build directory for that particular easyconfig.
You can determine the relative path via the build_path() function that is available from easybuild.tools.config, that should report the top directory that corresponds to the buildpath EasyBuild configuration option (see also https://docs.easybuild.io/configuration/#buildpath).

For, for example, for example-1.2.3-GCC-12.3.0.eb, the builddir path would be something like /tmp/myuser/easybuild/build/example/1.2.3/GCC-12.3.0/, with buildpath set to /tmp/myuser/easybuild/build.
Not that the actual build directory in which the compilation is being done would be one level deeper, corresponding to the unpacked source tarball, so something like /tmp/myuser/easybuild/build/example/1.2.3/GCC-12.3.0/example-1.2.3/.

So, I think you could create a subdirectory in the permanent storage location that uses the name of the easyconfig file (to keep it simple), and copy the contents of builddir in there.
You do somehow want to make sure that the target path is unique though, because you could have multiple builds ongoing on different nodes that would all copy to the same permanent location in the end...

akesandgren · 2024-08-14T10:27:58Z

@gkaf89 The builddir variable that is set in each easyblock instance hold the path to the build directory for that particular easyconfig. You can determine the relative path via the build_path() function that is available from easybuild.tools.config, that should report the top directory that corresponds to the buildpath EasyBuild configuration option (see also https://docs.easybuild.io/configuration/#buildpath).

For, for example, for example-1.2.3-GCC-12.3.0.eb, the builddir path would be something like /tmp/myuser/easybuild/build/example/1.2.3/GCC-12.3.0/, with buildpath set to /tmp/myuser/easybuild/build. Not that the actual build directory in which the compilation is being done would be one level deeper, corresponding to the unpacked source tarball, so something like /tmp/myuser/easybuild/build/example/1.2.3/GCC-12.3.0/example-1.2.3/.

So, I think you could create a subdirectory in the permanent storage location that uses the name of the easyconfig file (to keep it simple), and copy the contents of builddir in there. You do somehow want to make sure that the target path is unique though, because you could have multiple builds ongoing on different nodes that would all copy to the same permanent location in the end...

Yeah, the thing to copy should be builddir into a path with the diff of buildpath and builddir based in permanent-storage-location. Just make sure to remove old remnants of that first :-)

boegel · 2024-08-27T18:31:17Z

@gkaf89 If you need any help with this, do let us know!

gkaf89 · 2024-09-08T23:38:15Z

@boegel The commit is ready. I won't have enough time to familiarize myself with the test framework for the EasyBlockTest class to prepare a test before the next release.

The commit can be tested by modifying the configuration options of some easyconfig that uses the system toolchain to cause a failure. For instance I added the option

configopts = '--some-invalid-option'

in zlib-1.3.1.eb. The result is that the temporary log file in the build directory and the extracted source code are copied in a permanent location.

easybuild/framework/easyblock.py

boegel · 2024-09-11T06:56:10Z

@gkaf89 There's a problem with the tests, looks like test_toy_build was broken by the changes being made here?
See for example https://github.com/easybuilders/easybuild-framework/actions/runs/10805964835/job/29973948603

gkaf89 · 2024-09-11T08:05:23Z

The failure is caused because the target location for permanent storage is the same as the source location. The steps I am following to resolve the issue:

add a source/destination check to avoid a hard failure, and
detect how the source and the destination path end with the same value in the test.

gkaf89 · 2024-09-11T10:31:27Z

@boegel Some edge cases where uncovered by the tests. The latest commit resolves the issue.

I leave it up you if you prefer to move it to version 5. I am not familiar enough with the tests to test the PR extensively.

boegel · 2024-12-04T13:35:22Z

@gkaf89 As briefly discussed during the conf call today, it would be good if you could add a test (or enhance an existing one, like test_toy_broken) to verify that the added functionality works as intended (and keeps working).

Do let us know if you need any help with that!

gkaf89 · 2025-01-07T17:57:59Z

Some tests fail because a warning is printed:

WARNING: Command ' gcc toy.c -o toy ' failed, but we'll ignore it...

I want the compilation to fail, is there a way to silence the warning?

gkaf89 · 2025-01-08T13:19:43Z

I want the compilation to fail, is there a way to silence the warning?

Using run_test_toy_build_with_output instead of test_toy_build to catch and ignore all output. I am not sure if the function run_test_toy_build_with_output is used as intended here.

Flamefire

I added some suggestions inline

Flamefire · 2025-01-09T13:03:15Z

easybuild/tools/config.py

+    This is a path where file from the build_log_path can be stored permanently
+    :param ec: dict-like value that provides values for %(name)s and %(version)s template values
+    """
+    error_log_path = ConfigurationVariables()['errorlogpath']


Not a good idea to have a variable with the same name as a function. Maybe rename the function to get_.... or default_...?

Flamefire · 2025-01-09T13:03:20Z

easybuild/tools/config.py

+    """
+    Return the default error log path
+
+    This is a path where file from the build_log_path can be stored permanently


Suggested change

This is a path where file from the build_log_path can be stored permanently

This is a path where files from the build_log_path can be stored permanently

Flamefire · 2025-01-09T13:04:19Z

easybuild/tools/config.py

+    if ec is None:
+        ec = {}
+
+    name, version = ec.get('name', '%(name)s'), ec.get('version', '%(version)s')


Why would you want %(name)s in the folder name?

Flamefire · 2025-01-09T13:05:04Z

easybuild/tools/config.py

+    Return the default error log path
+
+    This is a path where file from the build_log_path can be stored permanently
+    :param ec: dict-like value that provides values for %(name)s and %(version)s template values


Doesn't match what is used. It expects name and version keys, not templates

I used the %(name)s and %(version)s template strings as default values. I have seen this use in other cases, log_file_format function on the same file (they indeed appear in file paths). It seems like a nice way to inform the user about a missing value without a hard failure.

Should we avoid using template names as default values or explain their use in a better way?

Ah I see. Isn't trivial to understand for me what is meant by that. I'd change the documentation to

:param ec: dict-like value with at least the keys 'name' and 'version'

I would not use a default and just fail hard if the keys do not exist because that is what is documented: The function expects a dict with those keys. If you don't provide one it is the fault of the caller. If we do it right in easybuild the keys will always exist, so no need for fallbacks. Or am I missing anything?

Flamefire · 2025-01-09T13:07:00Z

easybuild/tools/config.py

+
+    path = base_path
+    inc_no = 1
+    while os.path.exists(path):


Maybe rather use filetools.create_unused_dir? This avoids duplicating the logic and a possible conflict when another eb instance runs in parallel

I need just the directory name to copy a directory to a new location, not to create a directory. The filetools.create_unused_dir couples closely the 2 functions, creating the directory name and creating the directory.

I will try to decouple the 2 functions and extract a function for creating a file name in a different commit.

That "coupling" is intentional: If you don't create the directory you have a race condition for process running in parallel defeating the purpose of this function. Why don't you want that directory created? Can't you just fill it later using the created directory?

You couple through the file system, got it.

test/framework/easyconfigs/test_ecs/t/toy/toy-0.0-test.eb

Flamefire · 2025-01-09T13:10:25Z

test/framework/toy_build.py

@@ -228,6 +233,9 @@ def test_toy_build(self, extra_args=None, ec_file=None, tmpdir=None, verify=True
                msg = "Pattern %s found in full test report: %s" % (regex.pattern, test_report_txt)
                self.assertTrue(regex.search(test_report_txt), msg)

+        if check_errorlog is not None:


Isn't this better done outside? Doesn't need to be in this function, does it?

Flamefire · 2025-01-09T13:13:19Z

test/framework/utilities.py

@@ -514,3 +516,15 @@ def find_full_path(base_path, trim=(lambda x: x)):
            break

    return full_path
+
+
+class TempDirectory:


There already is tempfile.TemporaryDirectory which additionally allows using in a context manager

Thanks, nice catch! Using context manager.

Flamefire · 2025-01-09T13:16:27Z

test/framework/easyconfigs/test_ecs/t/toy/toy-0.0-buggy.eb

@@ -0,0 +1,33 @@
+name = 'toy'


What we usually do instead of adding yet another test easyconfig is read_file it and write with modifications. See self.contents and self.eb_file

…failures The files can be build in some selected build path (--buildpath), and the logs of successful compilation are then concentrated to some other location for permanent storage (--logfile-format). Logs of failed builds remain in the build path location so that they can be inspected. However, this setup is problematic when building software in HPC jobs. Quite often in HPC systems the build path is set to some fast storage local to the node, like NVME raid mounted on `/tmp` or `/dev/shm` (as suggested in the documentation: https://docs.easybuild.io/configuration/#buildpath). The node storage is often wiped out after the end of a job, so the log files and the artifacts are no longer available after the termination of the job. This commit adds an option (--errorlogpath)to accumulate errors in some more permanent location, so that the can be easily inspected after a failed build.

Create tests for: - the `errorlogpath` option.

- There does not seem to be a field storing the path to the builddir of an easyblock relative to the base build path. In this refactored version the relative builddir is extracted from the full path and the base build path using the `os.path.relpath` function. - During the copying of the files, the operation may fail, for instance due to the lack of space in the target location or insufficient rights. Report the copying of the artifacts after the copy operations.

The function moves logs and artifacts of failed build in a special location for permanent storage.

The base builddir path is used to construct the builddir by - pre-pending the asboloute build path, and - adding a numerical suffix to ensure uniqueness.

The log messages mention both the temporary log file created in the build directory, and the path where the file is copied for permanent storage. This commits makes a distinction between the two path in the log messages.

…cess)"

- In testing multiple failures can occur in quick succession resulting in the same time stamp, and as a result in the same base error log path. Extent the path stamp with an increasing number (naive O(n^2) algorithm used at the moment, should be sufficient). - In case the user provides the same error log path as the build directory log path, add a check to prevent copying the files to prevent errors in the copying functions.

The toy test file is modified with a patch to fail during compilation. The tests verify that: - the source directory is copied to the error log path, - the log files are copied to the error log path, and - a warning for the compilation failure is reported in stdout.

gkaf89 marked this pull request as draft August 5, 2024 13:10

gkaf89 force-pushed the feature/error-logging branch 2 times, most recently from b306ccd to 092fcd0 Compare August 5, 2024 13:26

gkaf89 force-pushed the feature/error-logging branch 7 times, most recently from 50d99c3 to 86fe081 Compare August 12, 2024 19:10

boegel added the enhancement label Aug 13, 2024

boegel added this to the 4.x milestone Aug 13, 2024

gkaf89 force-pushed the feature/error-logging branch 4 times, most recently from 1274a2b to b1a9da8 Compare August 23, 2024 08:53

gkaf89 force-pushed the feature/error-logging branch from b1a9da8 to 6bc53e6 Compare September 8, 2024 22:43

gkaf89 marked this pull request as ready for review September 8, 2024 23:38

boegel requested changes Sep 11, 2024

View reviewed changes

easybuild/framework/easyblock.py Outdated Show resolved Hide resolved

gkaf89 force-pushed the feature/error-logging branch from b34ff11 to 5cfdfd1 Compare September 11, 2024 10:24

gkaf89 requested a review from boegel September 11, 2024 12:51

gkaf89 marked this pull request as draft December 21, 2024 20:02

gkaf89 force-pushed the feature/error-logging branch 2 times, most recently from e7437a1 to 65f0294 Compare December 21, 2024 20:16

gkaf89 force-pushed the feature/error-logging branch 2 times, most recently from a299a49 to 0d61be8 Compare January 7, 2025 17:36

gkaf89 marked this pull request as ready for review January 7, 2025 17:38

gkaf89 force-pushed the feature/error-logging branch from 0d61be8 to 97fc225 Compare January 7, 2025 17:40

gkaf89 force-pushed the feature/error-logging branch 2 times, most recently from 8ea0754 to 70cefe2 Compare January 7, 2025 18:17

gkaf89 force-pushed the feature/error-logging branch 6 times, most recently from 0eb8e61 to 3e2421e Compare January 9, 2025 08:27

Flamefire suggested changes Jan 9, 2025

View reviewed changes

gkaf89 and others added 10 commits January 9, 2025 15:35

Test error logging features

1e866dc

Create tests for: - the `errorlogpath` option.

[refactor] Extract function persisting logs and artifacts

e572d36

The function moves logs and artifacts of failed build in a special location for permanent storage.

[refactor] Extract function generating the relative base builddir path

23ae7a8

The base builddir path is used to construct the builddir by - pre-pending the asboloute build path, and - adding a numerical suffix to ensure uniqueness.

fix minor code style issue by using "not success" instead of "not(suc…

c9d77ce

…cess)"

[style] Use more expressive input variable name

34683b1

gkaf89 force-pushed the feature/error-logging branch 2 times, most recently from 0c5bf04 to fe16762 Compare January 9, 2025 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy build log and artifacts to a permanent location after failures #4601

Copy build log and artifacts to a permanent location after failures #4601

gkaf89 commented Aug 5, 2024

gkaf89 commented Aug 5, 2024 •

edited

Loading

boegel commented Aug 14, 2024

akesandgren commented Aug 14, 2024

boegel commented Aug 27, 2024

gkaf89 commented Sep 8, 2024

boegel commented Sep 11, 2024

gkaf89 commented Sep 11, 2024

gkaf89 commented Sep 11, 2024

boegel commented Dec 4, 2024

gkaf89 commented Jan 7, 2025

gkaf89 commented Jan 8, 2025

Flamefire left a comment

Flamefire Jan 9, 2025

Flamefire Jan 9, 2025

Flamefire Jan 9, 2025

Flamefire Jan 9, 2025

gkaf89 Jan 10, 2025

Flamefire Jan 10, 2025

Flamefire Jan 9, 2025

gkaf89 Jan 10, 2025

Flamefire Jan 10, 2025

gkaf89 Jan 10, 2025

Flamefire Jan 9, 2025

Flamefire Jan 9, 2025

gkaf89 Jan 10, 2025

Flamefire Jan 9, 2025

	This is a path where file from the build_log_path can be stored permanently
	This is a path where files from the build_log_path can be stored permanently

Copy build log and artifacts to a permanent location after failures #4601

Are you sure you want to change the base?

Copy build log and artifacts to a permanent location after failures #4601

Conversation

gkaf89 commented Aug 5, 2024

gkaf89 commented Aug 5, 2024 • edited Loading

boegel commented Aug 14, 2024

akesandgren commented Aug 14, 2024

boegel commented Aug 27, 2024

gkaf89 commented Sep 8, 2024

boegel commented Sep 11, 2024

gkaf89 commented Sep 11, 2024

gkaf89 commented Sep 11, 2024

boegel commented Dec 4, 2024

gkaf89 commented Jan 7, 2025

gkaf89 commented Jan 8, 2025

Flamefire left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkaf89 commented Aug 5, 2024 •

edited

Loading