Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add candle llama model #19

Merged
merged 2 commits into from
Apr 4, 2024
Merged

feat: add candle llama model #19

merged 2 commits into from
Apr 4, 2024

Conversation

Cifko
Copy link
Collaborator

@Cifko Cifko commented Apr 3, 2024

Pr #10 needs to go in first

@Cifko Cifko force-pushed the llama-model branch 4 times, most recently from 2e0cbaf to 15285b9 Compare April 4, 2024 08:22
Copy link
Contributor

@jorgeantonio21 jorgeantonio21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a few (minor) comments

cache: Cache,
}

pub struct Input {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to conform with Request::ModelInput, as of now. But since this might change in the future, we can merge it for now.

model_id: Option<String>,
revision: Option<String>,
which: Which,
use_flash_attn: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use_flash_attn is not relevant for fetching. It is a parameter to be used when loading/running the model, to speed up inference runtime by reducing the model weights bandwidth load across different memory components of the GPU.

};
let api = Api::new()?;
let model_id = fetch.model_id.clone().unwrap_or_else(|| match fetch.which {
Which::V1 => "Narsil/amall-7b".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add an enum ModelId that maps each version to the right string id.

}

fn model_id(&self) -> crate::models::ModelId {
"TinyLlama/TinyLlama-1.1B-Chat-v1.0".to_string()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on the model size and should be not hardcoded.

@jorgeantonio21 jorgeantonio21 merged commit c2f064c into main Apr 4, 2024
1 check failed
@Cifko Cifko deleted the llama-model branch April 10, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants