Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast server startup, using file_handle to load model param files #840

Merged
merged 4 commits into from
Jan 17, 2025

Conversation

stbaione
Copy link
Contributor

@stbaione stbaione commented Jan 17, 2025

Description

Currently, when we load model parameter files, we load the entire contents of the files into memory, then mmap that data to the devices. This would cause very long server startup times. For example, 70b (~130 GB) took me 10 minutes to start the server, and 405b (~750 GB) took me over 5 hours to start the server.

As an alternative, this PR uses iree_io_file_handle_open to obtain a handle to the parameter files, then streams that data to the devices, insteading of mmaping it. After this change, we are able to start the server for 70b and 405b within seconds.

We default to the new method and add a private function LoadMmap for cases where mmap == true. This should improve the startup time for both LLM and SDXL, especially when loading large files.

stbaione and others added 3 commits January 17, 2025 15:15
… `irpa` data to devices.

This is much faster for server startup, especially with large parameter files, than loading entire contents of file into memory, and mmaping it to devices.
Needs a little cleanup...
Add private `LoadMmap` function that uses legacy loading method to still allow mmaping param files
@stbaione stbaione self-assigned this Jan 17, 2025
Copy link
Contributor

@monorimet monorimet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@stbaione stbaione merged commit 0c6d061 into nod-ai:main Jan 17, 2025
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants