Code object compression via bundling #1374

bstefanuk · 2024-11-22T00:13:42Z

Summary:

This PR adds a compression layer to all final code objects, thereby generating smaller libraries at the expense of build time. Includes minor refactoring.

Outcomes:

A new clang-offload-bundler invocation is added after assembly object linking.
getAssemblyCodeObjectFiles has been renamed to buildAssemblyCodeObjectFiles to match the name of source kernel functions.

Build	Time	Build Size (build/library/ directory)
Feature, gfx90a	8m38.902s	269M
Develop, gfx90a	8m3.921s	812M
Feature, gfx90a		2.4G
Develop, gfx90a		11G

Testing and Environment:

Docker: Ubuntu 24.04, ROCm 6.4 RC stack, AMD clang version 18.0.0, AMD clang-offload-bundler version 18.0.0

Tested with hipBLASLt bench and test clients

tensilelite/Tensile/TensileCreateLibrary.py

TorreZuk

Basics look good, assume all rocblas tests passed.

TorreZuk · 2024-11-22T15:50:06Z

tensilelite/Tensile/TensileCreateLibrary.py

+    if os.name == "nt":
+      # On Windows, the objectFiles list command line (including spaces)
+      # exceeds the limit of 8191 characters, so using response file
+
+      responseFile = os.path.join('/tmp', 'clangArgs.txt')
+      with open(responseFile, 'wt') as file:
+        file.write(" ".join(objFiles))
+        file.flush()
+
+      args = [globalParameters['AssemblerPath'], '-target', 'amdgcn-amd-amdhsa', '-o', coFileRaw, '@clangArgs.txt']
+      subprocess.check_call(args, cwd=asmDir)
+    else:
+      numObjFiles = len(objFiles)
+      maxObjFiles = 10000
+
+      if numObjFiles > maxObjFiles:
+        batchedObjFiles = [objFiles[i:i+maxObjFiles] for i in range(0, numObjFiles, maxObjFiles)]
+        batchSize = int(math.ceil(numObjFiles / maxObjFiles))


Can you consider doing what I added for windows (response file) instead of processing in batches? Is there a limit of 10000 with the linker?

In TensileLite, this batching strategy is implemented, but the source PR doesn't describe the motivation beyond your interpretation: some error was faced in the past because too many arguments were issued to the compiler.

I spent some time trying to find documentation on this 10000-input limit but couldn't find anything meaningful, so I left the implementation intact. @KKyang I believe that you implemented this feature, could you provide more context?

That's correct.

LunNova · 2024-11-27T16:35:43Z

Might be worth zstd compressing the msgpack files too, they're pretty compressible. Here's an untested attempt at decompress support in case it's helpful:

diff --git a/Tensile/Source/lib/source/msgpack/MessagePack.cpp b/Tensile/Source/lib/source/msgpack/MessagePack.cpp
index de97929c..dbc397e0 100644
--- a/Tensile/Source/lib/source/msgpack/MessagePack.cpp
+++ b/Tensile/Source/lib/source/msgpack/MessagePack.cpp
@@ -28,6 +28,8 @@
 
 #include <Tensile/msgpack/Loading.hpp>
 
+#include <zstd.h>
+
 #include <fstream>
 
 namespace Tensile
@@ -86,6 +88,34 @@ namespace Tensile
                 return nullptr;
             }
 
+            // Check if the file is zstd compressed
+            char magic[4];
+            in.read(magic, 4);
+            bool isCompressed = (in.gcount() == 4 && magic[0] == '\x28' && magic[1] == '\xB5' && magic[2] == '\x2F' && magic[3] == '\xFD');
+            // Reset file pointer to the beginning
+            in.seekg(0, std::ios::beg);
+
+            if (isCompressed) {
+                // Decompress zstd file
+                std::vector<char> compressedData((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>());
+
+                size_t decompressedSize = ZSTD_getFrameContentSize(compressedData.data(), compressedData.size());
+                if (decompressedSize == ZSTD_CONTENTSIZE_ERROR || decompressedSize == ZSTD_CONTENTSIZE_UNKNOWN) {
+                    if(Debug::Instance().printDataInit())
+                        std::cout << "Error: Unable to determine decompressed size for " << filename << std::endl;
+                    return nullptr;
+                }
+
+                std::vector<char> decompressedData(decompressedSize);
+                size_t dSize = ZSTD_decompress(decompressedData.data(), decompressedSize, compressedData.data(), compressedData.size());
+                if (ZSTD_isError(dSize)) {
+                    if(Debug::Instance().printDataInit())
+                        std::cout << "Error: ZSTD decompression failed for " << filename << std::endl;
+                    return nullptr;
+                }
+
+                msgpack::unpack(result, decompressedData.data(), dSize);
+            } else {
             msgpack::unpacker unp;
             bool              finished_parsing;
             constexpr size_t  buffer_size = 1 << 19;
@@ -109,6 +139,7 @@ namespace Tensile
 
                 return nullptr;
             }
+            }
         }
         catch(std::runtime_error const& exc)
         {

bstefanuk · 2024-11-27T17:40:52Z

@LunNova Thanks for the code snippet. I've had this idea as well and have plans to implement it. However, for the scope of this PR we'll keep it to code object files and add the .dat file compression in another PR.

bstefanuk self-assigned this Nov 22, 2024

bstefanuk requested review from TorreZuk and ellosel November 22, 2024 00:15

bstefanuk force-pushed the bundle-compress-co-files-tensilelite branch from 275cfda to 471de13 Compare November 22, 2024 00:27

bstefanuk commented Nov 22, 2024

View reviewed changes

tensilelite/Tensile/TensileCreateLibrary.py Outdated Show resolved Hide resolved

TorreZuk previously approved these changes Nov 22, 2024

View reviewed changes

bstefanuk dismissed TorreZuk’s stale review via ba921fa November 22, 2024 19:43

bstefanuk marked this pull request as ready for review November 26, 2024 18:26

bstefanuk requested review from jichangjichang, KKyang, aazz44ss, vin-huang, imcarsonliao, hcman2, Serge45, Jinp800125, TonyYHsieh and solaslin as code owners November 26, 2024 18:26

bstefanuk added 2 commits November 27, 2024 01:15

feat: compress code objects

e5ca737

feat: move system commands to module

d8b5337

bstefanuk force-pushed the bundle-compress-co-files-tensilelite branch from 1458f44 to d8b5337 Compare November 27, 2024 01:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code object compression via bundling #1374

Code object compression via bundling #1374

bstefanuk commented Nov 22, 2024 •

edited

Loading

TorreZuk left a comment

TorreZuk Nov 22, 2024

bstefanuk Nov 22, 2024 •

edited

Loading

KKyang Nov 27, 2024

bstefanuk Nov 27, 2024

LunNova commented Nov 27, 2024

bstefanuk commented Nov 27, 2024

Code object compression via bundling #1374

Are you sure you want to change the base?

Code object compression via bundling #1374

Conversation

bstefanuk commented Nov 22, 2024 • edited Loading

TorreZuk left a comment

Choose a reason for hiding this comment

TorreZuk Nov 22, 2024

Choose a reason for hiding this comment

bstefanuk Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

KKyang Nov 27, 2024

Choose a reason for hiding this comment

bstefanuk Nov 27, 2024

Choose a reason for hiding this comment

LunNova commented Nov 27, 2024

bstefanuk commented Nov 27, 2024

bstefanuk commented Nov 22, 2024 •

edited

Loading

bstefanuk Nov 22, 2024 •

edited

Loading