-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] inference service #12
Conversation
In core_thread.rs: - Added constant. - Introduced and structs. - Modified struct to utilize the new dispatcher. - Added methods to for sending commands and running inference. - Removed unnecessary imports and improved error handling. In service.rs: - Removed the entire file as it's no longer needed after refactoring. These changes refactor the threading logic in core_thread.rs and eliminate the now-obsolete service.rs file, improving code organization and maintainability.
This commit refactors the project structure by moving , , and related types to a separate module named . This change improves code organization and maintainability by grouping related functionality together. Changes Made: - Moved , , , and related types to the module. - Updated import paths in affected files to reflect the module restructuring. Impact: - Improved code organization and maintainability. - Clear separation of concerns between different components of the application.
This commit includes several changes: - Updated dependencies in the file, including adding , , and crates. - Refactored the struct and its related methods to use the crate for loading configuration from a file. - Updated to properly initialize and fetch models during service startup. - Added deserialization support for enum to read from TOML configuration files. - Updated error handling in various parts of the codebase. - Added a test case to ensure proper initialization of the inference service with sample configuration data. This refactor ensures better modularity, improved error handling, and easier maintenance of the codebase.
- Added hf-hub crate version 0.3.2 to Cargo.toml for dependency management. - Updated atoma-inference/Cargo.toml to include hf-hub as a workspace with the tokio feature enabled. - Added hf-hub to the list of dependencies in atoma-inference/src/apis/hugging_face.rs. - Updated ModelType enum in atoma-inference/src/models.rs to include new model types supported by hf-hub. - Modified ApiTrait and Api implementations to incorporate hf-hub functionality for model fetching. - Implemented asynchronous model fetching in the ApiTrait trait using async_trait. - Added tests for InferenceService initialization and model fetching with hf-hub.
- Renamed the `storage_base_path` field in the `InferenceConfig` struct to `storage_folder` for improved clarity and consistency with the actual purpose of the field. - Updated references to `storage_base_path` to use `storage_folder` throughout the codebase in `config.rs` and `core.rs`. - Adjusted tests in `service.rs` to reflect the renaming of the field in the configuration.
…ing and monitoring In this commit, changes have been made to core.rs and main.rs files to integrate tracing for enhanced debugging and monitoring capabilities. Specifically, the following modifications were implemented: - Imported the `tracing::info` module in core.rs to enable logging of informational messages. - Added logging statements using `info!` macro in core.rs to indicate the beginning of inference and provide information about the prompt and model being used. - Updated main.rs to remove unused variables and commented-out code, ensuring cleaner and more maintainable code. - Added logging statements in main.rs to indicate the start of the Core Dispatcher and fetching of models. These changes aim to improve the visibility into the execution flow of the inference service and facilitate easier debugging and monitoring during development and deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fetching model should be done on start, or on the first request. But it should not block the whole code. Also running the inference should not block running another inference. The user should be able to specify how many inferences at once can he run. Or it could be done by getting the CPU/GPU usage.
Add tracing-subscriber crate to the dependencies in Cargo.toml to enable structured logging in the project. Changes: - Added tracing-subscriber = "0.3.18" to dependencies in Cargo.toml - Added tracing-subscriber.workspace = true to the workspace in Cargo.toml
experiments around dependencies
No description provided.