Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] inference service #12

Merged
merged 30 commits into from
Apr 2, 2024
Merged

[wip] inference service #12

merged 30 commits into from
Apr 2, 2024

Conversation

jorgeantonio21
Copy link
Contributor

No description provided.

In core_thread.rs:
- Added  constant.
- Introduced  and  structs.
- Modified  struct to utilize the new dispatcher.
- Added methods to  for sending commands and running inference.
- Removed unnecessary imports and improved error handling.

In service.rs:
- Removed the entire file as it's no longer needed after refactoring.

These changes refactor the threading logic in core_thread.rs and eliminate the now-obsolete service.rs file, improving code organization and maintainability.
This commit refactors the project structure by moving , , and related types to a separate module named . This change improves code organization and maintainability by grouping related functionality together.

Changes Made:
- Moved , , , and related types to the  module.
- Updated import paths in affected files to reflect the module restructuring.

Impact:
- Improved code organization and maintainability.
- Clear separation of concerns between different components of the application.
This commit includes several changes:

- Updated dependencies in the  file, including adding , , and  crates.
- Refactored the  struct and its related methods to use the  crate for loading configuration from a file.
- Updated  to properly initialize and fetch models during service startup.
- Added deserialization support for  enum to read from TOML configuration files.
- Updated error handling in various parts of the codebase.
- Added a test case to ensure proper initialization of the inference service with sample configuration data.

This refactor ensures better modularity, improved error handling, and easier maintenance of the codebase.
- Added hf-hub crate version 0.3.2 to Cargo.toml for dependency management.
- Updated atoma-inference/Cargo.toml to include hf-hub as a workspace with the tokio feature enabled.
- Added hf-hub to the list of dependencies in atoma-inference/src/apis/hugging_face.rs.
- Updated ModelType enum in atoma-inference/src/models.rs to include new model types supported by hf-hub.
- Modified ApiTrait and Api implementations to incorporate hf-hub functionality for model fetching.
- Implemented asynchronous model fetching in the ApiTrait trait using async_trait.
- Added tests for InferenceService initialization and model fetching with hf-hub.
- Renamed the `storage_base_path` field in the `InferenceConfig` struct to `storage_folder` for improved clarity and consistency with the actual purpose of the field.
- Updated references to `storage_base_path` to use `storage_folder` throughout the codebase in `config.rs` and `core.rs`.
- Adjusted tests in `service.rs` to reflect the renaming of the field in the configuration.
…ing and monitoring

In this commit, changes have been made to core.rs and main.rs files to integrate tracing for enhanced debugging and monitoring capabilities. Specifically, the following modifications were implemented:

- Imported the `tracing::info` module in core.rs to enable logging of informational messages.
- Added logging statements using `info!` macro in core.rs to indicate the beginning of inference and provide information about the prompt and model being used.
- Updated main.rs to remove unused variables and commented-out code, ensuring cleaner and more maintainable code.
- Added logging statements in main.rs to indicate the start of the Core Dispatcher and fetching of models.

These changes aim to improve the visibility into the execution flow of the inference service and facilitate easier debugging and monitoring during development and deployment.
Copy link
Collaborator

@Cifko Cifko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fetching model should be done on start, or on the first request. But it should not block the whole code. Also running the inference should not block running another inference. The user should be able to specify how many inferences at once can he run. Or it could be done by getting the CPU/GPU usage.

atoma-inference/src/core.rs Outdated Show resolved Hide resolved
atoma-inference/src/core.rs Outdated Show resolved Hide resolved
atoma-inference/src/service.rs Outdated Show resolved Hide resolved
atoma-inference/src/core_thread.rs Outdated Show resolved Hide resolved
atoma-inference/src/core_thread.rs Outdated Show resolved Hide resolved
Add tracing-subscriber crate to the dependencies in Cargo.toml to enable structured logging in the project.

Changes:
- Added tracing-subscriber = "0.3.18" to dependencies in Cargo.toml
- Added tracing-subscriber.workspace = true to the workspace in Cargo.toml
@jorgeantonio21 jorgeantonio21 merged commit 1c42945 into main Apr 2, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants