[wip] inference service #12

jorgeantonio21 · 2024-03-25T21:36:28Z

No description provided.

In core_thread.rs: - Added constant. - Introduced and structs. - Modified struct to utilize the new dispatcher. - Added methods to for sending commands and running inference. - Removed unnecessary imports and improved error handling. In service.rs: - Removed the entire file as it's no longer needed after refactoring. These changes refactor the threading logic in core_thread.rs and eliminate the now-obsolete service.rs file, improving code organization and maintainability.

This commit refactors the project structure by moving , , and related types to a separate module named . This change improves code organization and maintainability by grouping related functionality together. Changes Made: - Moved , , , and related types to the module. - Updated import paths in affected files to reflect the module restructuring. Impact: - Improved code organization and maintainability. - Clear separation of concerns between different components of the application.

This commit includes several changes: - Updated dependencies in the file, including adding , , and crates. - Refactored the struct and its related methods to use the crate for loading configuration from a file. - Updated to properly initialize and fetch models during service startup. - Added deserialization support for enum to read from TOML configuration files. - Updated error handling in various parts of the codebase. - Added a test case to ensure proper initialization of the inference service with sample configuration data. This refactor ensures better modularity, improved error handling, and easier maintenance of the codebase.

- Added hf-hub crate version 0.3.2 to Cargo.toml for dependency management. - Updated atoma-inference/Cargo.toml to include hf-hub as a workspace with the tokio feature enabled. - Added hf-hub to the list of dependencies in atoma-inference/src/apis/hugging_face.rs. - Updated ModelType enum in atoma-inference/src/models.rs to include new model types supported by hf-hub. - Modified ApiTrait and Api implementations to incorporate hf-hub functionality for model fetching. - Implemented asynchronous model fetching in the ApiTrait trait using async_trait. - Added tests for InferenceService initialization and model fetching with hf-hub.

- Renamed the `storage_base_path` field in the `InferenceConfig` struct to `storage_folder` for improved clarity and consistency with the actual purpose of the field. - Updated references to `storage_base_path` to use `storage_folder` throughout the codebase in `config.rs` and `core.rs`. - Adjusted tests in `service.rs` to reflect the renaming of the field in the configuration.

…ing and monitoring In this commit, changes have been made to core.rs and main.rs files to integrate tracing for enhanced debugging and monitoring capabilities. Specifically, the following modifications were implemented: - Imported the `tracing::info` module in core.rs to enable logging of informational messages. - Added logging statements using `info!` macro in core.rs to indicate the beginning of inference and provide information about the prompt and model being used. - Updated main.rs to remove unused variables and commented-out code, ensuring cleaner and more maintainable code. - Added logging statements in main.rs to indicate the start of the Core Dispatcher and fetching of models. These changes aim to improve the visibility into the execution flow of the inference service and facilitate easier debugging and monitoring during development and deployment.

Cifko

I think the fetching model should be done on start, or on the first request. But it should not block the whole code. Also running the inference should not block running another inference. The user should be able to specify how many inferences at once can he run. Or it could be done by getting the CPU/GPU usage.

atoma-inference/src/core.rs

atoma-inference/src/service.rs

atoma-inference/src/core_thread.rs

Add tracing-subscriber crate to the dependencies in Cargo.toml to enable structured logging in the project. Changes: - Added tracing-subscriber = "0.3.18" to dependencies in Cargo.toml - Added tracing-subscriber.workspace = true to the workspace in Cargo.toml

… parallel

…up the models

atoma-inference/src/model_thread.rs

atoma-inference/src/apis/hugging_face.rs

experiments around dependencies

jorgeantonio21 added 7 commits March 23, 2024 17:45

add hugging face client logic

919298a

Cifko reviewed Mar 26, 2024

View reviewed changes

jorgeantonio21 added 6 commits March 26, 2024 18:01

address PR comments

65f6e01

refactor core thread to model thread, to facilitate models running in…

1ce0c12

… parallel

remove core, rename core_thread to model_thread, and work on setting …

156b398

…up the models

add model_thread.rs, after renaming

6990a12

intermediate steps

d52f113

Cifko reviewed Mar 27, 2024

View reviewed changes

atoma-inference/src/model_thread.rs Outdated Show resolved Hide resolved

Cifko reviewed Mar 27, 2024

View reviewed changes

atoma-inference/src/apis/hugging_face.rs Outdated Show resolved Hide resolved

jorgeantonio21 added 14 commits March 27, 2024 17:57

intermediate steps

618bea8

intermediate steps

54e0abd

address new PR comments

e60b586

add test to config construction

58cfca5

remove unused code

1cdb66a

remove full dependency of std::sync

b56d0b5

change to main branch

04b6d6c

Merge pull request #13 from atoma-network/experiments

d0f6dff

experiments around dependencies

add model trait interface and refactor code to be more general

a19817e

rename InferenceService to ModelService

4a12b71

simplify code

cddb534

remove fetch method from ModelTrait

f673dea

cargo fmt

e403ba9

rename

b8a51ac

jorgeantonio21 added 2 commits March 31, 2024 21:21

remove unused error fields

6193465

removed unused Builder from ModelTrait associated type

303aedf

jorgeantonio21 mentioned this pull request Apr 1, 2024

adds mamba model integration with thread model #15

Merged

merge main and resolve conflicts

415974f

Cifko approved these changes Apr 2, 2024

View reviewed changes

jorgeantonio21 merged commit 1c42945 into main Apr 2, 2024
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] inference service #12

[wip] inference service #12

jorgeantonio21 commented Mar 25, 2024

Cifko left a comment

[wip] inference service #12

[wip] inference service #12

Conversation

jorgeantonio21 commented Mar 25, 2024

Cifko left a comment

Choose a reason for hiding this comment