You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.
I'm trying to make inferences using more than 2k token contexts, but I'm having some trouble making it work for 65B models. The following code works on 7B scale models, but returns token sampling failed (due to nan logits) when using 65B models.
I'm trying with internal models, but using these 7B and 65B reproduce the issue.
use std::{io::Write, path::PathBuf};use llm::Model;fnmain(){let llama = llm::load::<llm::models::Llama>(
std::path::Path::new("/data/tmp/llama-65b.ggmlv3.q4_0.bin"),// std::path::Path::new("/data/tmp/llama-7b.ggmlv3.q4_0.bin"),
llm::TokenizerSource::HuggingFaceTokenizerFile(PathBuf::from("/data/tmp/tokenizer.json").to_owned()),
llm::ModelParameters{use_gpu:true,gpu_layers:Some(99),context_size:8192,rope_overrides:Some(llm::RoPEOverrides{frequency_scale:0.25,
..Default::default()}),
..Default::default()},
llm::load_progress_callback_stdout
).unwrap_or_else(|err| panic!("Failed to load model: {err}"));println!("\n\ncontext_size {}", llama.context_size());let prompt = "hello ".repeat(2800);// works until 2k tokensletmut session = llama.start_session(llm::InferenceSessionConfig{n_batch:256,
..Default::default()});let res = session.infer::<std::convert::Infallible>(&llama,&mut rand::thread_rng(),&llm::InferenceRequest{prompt:(&prompt).into(),parameters:&llm::InferenceParameters::default(),play_back_previous_tokens:false,maximum_token_count:Some(1),},&mutDefault::default(),
|r| match r {
_ => Ok(llm::InferenceFeedback::Continue),});match res {Ok(result) => println!("\n\nInference stats:\n{result}"),Err(err) => println!("\n{err}"),}}
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I'm trying to make inferences using more than 2k token contexts, but I'm having some trouble making it work for 65B models. The following code works on 7B scale models, but returns
token sampling failed
(due to nan logits) when using 65B models.I'm trying with internal models, but using these 7B and 65B reproduce the issue.
The text was updated successfully, but these errors were encountered: