Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code object compression via bundling #1374

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

bstefanuk
Copy link
Contributor

@bstefanuk bstefanuk commented Nov 22, 2024

Summary:

This PR adds a compression layer to all final code objects, thereby generating smaller libraries at the expense of build time. Includes minor refactoring.

Outcomes:

  • A new clang-offload-bundler invocation is added after assembly object linking.
  • getAssemblyCodeObjectFiles has been renamed to buildAssemblyCodeObjectFiles to match the name of source kernel functions.
Build Time Build Size (build/library/ directory)
Feature, gfx90a 8m38.902s 269M
Develop, gfx90a 8m3.921s 812M
Feature, gfx90a 2.4G
Develop, gfx90a 11G

Testing and Environment:

Docker: Ubuntu 24.04, ROCm 6.4 RC stack, AMD clang version 18.0.0, AMD clang-offload-bundler version 18.0.0

Tested with hipBLASLt bench and test clients

@bstefanuk bstefanuk self-assigned this Nov 22, 2024
@bstefanuk bstefanuk force-pushed the bundle-compress-co-files-tensilelite branch from 275cfda to 471de13 Compare November 22, 2024 00:27
TorreZuk
TorreZuk previously approved these changes Nov 22, 2024
Copy link
Contributor

@TorreZuk TorreZuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basics look good, assume all rocblas tests passed.

Comment on lines 106 to 123
if os.name == "nt":
# On Windows, the objectFiles list command line (including spaces)
# exceeds the limit of 8191 characters, so using response file

responseFile = os.path.join('/tmp', 'clangArgs.txt')
with open(responseFile, 'wt') as file:
file.write(" ".join(objFiles))
file.flush()

args = [globalParameters['AssemblerPath'], '-target', 'amdgcn-amd-amdhsa', '-o', coFileRaw, '@clangArgs.txt']
subprocess.check_call(args, cwd=asmDir)
else:
numObjFiles = len(objFiles)
maxObjFiles = 10000

if numObjFiles > maxObjFiles:
batchedObjFiles = [objFiles[i:i+maxObjFiles] for i in range(0, numObjFiles, maxObjFiles)]
batchSize = int(math.ceil(numObjFiles / maxObjFiles))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you consider doing what I added for windows (response file) instead of processing in batches? Is there a limit of 10000 with the linker?

Copy link
Contributor Author

@bstefanuk bstefanuk Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In TensileLite, this batching strategy is implemented, but the source PR doesn't describe the motivation beyond your interpretation: some error was faced in the past because too many arguments were issued to the compiler.

I spent some time trying to find documentation on this 10000-input limit but couldn't find anything meaningful, so I left the implementation intact. @KKyang I believe that you implemented this feature, could you provide more context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#415 this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct.

@bstefanuk bstefanuk force-pushed the bundle-compress-co-files-tensilelite branch from 1458f44 to d8b5337 Compare November 27, 2024 01:16
@LunNova
Copy link

LunNova commented Nov 27, 2024

Might be worth zstd compressing the msgpack files too, they're pretty compressible. Here's an untested attempt at decompress support in case it's helpful:

diff --git a/Tensile/Source/lib/source/msgpack/MessagePack.cpp b/Tensile/Source/lib/source/msgpack/MessagePack.cpp
index de97929c..dbc397e0 100644
--- a/Tensile/Source/lib/source/msgpack/MessagePack.cpp
+++ b/Tensile/Source/lib/source/msgpack/MessagePack.cpp
@@ -28,6 +28,8 @@
 
 #include <Tensile/msgpack/Loading.hpp>
 
+#include <zstd.h>
+
 #include <fstream>
 
 namespace Tensile
@@ -86,6 +88,34 @@ namespace Tensile
                 return nullptr;
             }
 
+            // Check if the file is zstd compressed
+            char magic[4];
+            in.read(magic, 4);
+            bool isCompressed = (in.gcount() == 4 && magic[0] == '\x28' && magic[1] == '\xB5' && magic[2] == '\x2F' && magic[3] == '\xFD');
+            // Reset file pointer to the beginning
+            in.seekg(0, std::ios::beg);
+
+            if (isCompressed) {
+                // Decompress zstd file
+                std::vector<char> compressedData((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>());
+
+                size_t decompressedSize = ZSTD_getFrameContentSize(compressedData.data(), compressedData.size());
+                if (decompressedSize == ZSTD_CONTENTSIZE_ERROR || decompressedSize == ZSTD_CONTENTSIZE_UNKNOWN) {
+                    if(Debug::Instance().printDataInit())
+                        std::cout << "Error: Unable to determine decompressed size for " << filename << std::endl;
+                    return nullptr;
+                }
+
+                std::vector<char> decompressedData(decompressedSize);
+                size_t dSize = ZSTD_decompress(decompressedData.data(), decompressedSize, compressedData.data(), compressedData.size());
+                if (ZSTD_isError(dSize)) {
+                    if(Debug::Instance().printDataInit())
+                        std::cout << "Error: ZSTD decompression failed for " << filename << std::endl;
+                    return nullptr;
+                }
+
+                msgpack::unpack(result, decompressedData.data(), dSize);
+            } else {
             msgpack::unpacker unp;
             bool              finished_parsing;
             constexpr size_t  buffer_size = 1 << 19;
@@ -109,6 +139,7 @@ namespace Tensile
 
                 return nullptr;
             }
+            }
         }
         catch(std::runtime_error const& exc)
         {

@bstefanuk
Copy link
Contributor Author

@LunNova Thanks for the code snippet. I've had this idea as well and have plans to implement it. However, for the scope of this PR we'll keep it to code object files and add the .dat file compression in another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants