-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add candle llama model #19
Conversation
2e0cbaf
to
15285b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left a few (minor) comments
cache: Cache, | ||
} | ||
|
||
pub struct Input { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to conform with Request::ModelInput
, as of now. But since this might change in the future, we can merge it for now.
model_id: Option<String>, | ||
revision: Option<String>, | ||
which: Which, | ||
use_flash_attn: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use_flash_attn
is not relevant for fetching. It is a parameter to be used when loading/running the model, to speed up inference runtime by reducing the model weights bandwidth load across different memory components of the GPU.
}; | ||
let api = Api::new()?; | ||
let model_id = fetch.model_id.clone().unwrap_or_else(|| match fetch.which { | ||
Which::V1 => "Narsil/amall-7b".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add an enum ModelId
that maps each version to the right string id.
} | ||
|
||
fn model_id(&self) -> crate::models::ModelId { | ||
"TinyLlama/TinyLlama-1.1B-Chat-v1.0".to_string() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This depends on the model size and should be not hardcoded.
Pr #10 needs to go in first