-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: KernelMemory.AskAsync() does not work - exception: object reference not set to an instance of an object #891
Comments
I encountered the same issue when running the sample code: "Kernel Memory: Document Q&A" or "Kernel Memory: Save and Load" from the LLama.Examples project. |
@tusharmevl I have not found a fix for the Kernel Memory issue. It seems that the integration with Semantic Kernel Memory works. You may try using that as an alternative if your system only needs to support Text. |
@jwangga Ok Thanks! Yes I need to support text only for now, will try that. |
Thanks @jwangga !! I'm seeing that you can use Semantic Kernel Memory (SKM) as well. Doesn't appear that you can "chat" with SKM to discuss results unfortunately. Have you been able to figure out a way to "ask" questions of SKM? |
I'm also having the same issue with the Kernel Memory: Document Q&A example. |
Please, I really need to fix the error. So far, I can only use such versions: Any newer versions do not work. I think the mistake is here: ISamplingPipelineExtensions.Sample() |
I found the place where the error occurs. llama_get_logits_ith suddenly return null.
Stack: StatelessExecutor.InferAsync() But I don't understand what to do next and how to fix the error. Apparently, null shouldn't be there. Can anyone help with this error? Thanks. |
That's probably indicative of two bugs in LLamaSharp. Wrapper ErrorThe docs for
So it is valid for This is why you get a hard crash instead of an exception. Higher Level Error
|
@martindevans I have found a solution.
|
While testing, I noticed that it became slower to work, about 2 times after 0.13.0. Why is this interesting? |
Aha, I think you've cracked it! A while ago the behaviour of the embeddings flag was changed, so logits can no longer be extracted if |
And in LLamaSharpTextEmbeddingGenerator must specify the values UBatchSize, BatchSize! |
I'm not sure about that - there should be sensible defaults for those values. In LLamaSharp they're set to default values here. It's possible KernelMemory is overriding those defaults with something incorrect though (I don't really know the KM stuff, so I can't be certain). |
Without these values, there will be an error "Input contains more tokens than configured batch size". That is, the value must be greater than 512. And now you can only define them by rewriting the LLamaSharpTextEmbeddingGenerator class. |
Apparently it is necessary to add UBatchSize, BatchSize to LLamaSharpConfig. It seems that embeddings=false should always be done. |
I'm super busy this month, but I will try to make time to fix the issues you found that I summarised here when I get a chance (soon, hopefully. Definitely before the next release). |
#920 Fixes the lowest level wrapper error, so at least it throws an exception. Hopefully that might help debug the higher level issue. |
The problem has been found. You need to force embeddings=false. |
I wasn't sure if there's more going on, since you also mentioned a need to change the batch size. Is that just because of the size of your request (you need a larger batch to fit it all in), or is there more going on there? |
Yes, the block size is larger than batchSize, but now this value cannot be changed except to rewrite the class LLamaSharpTextEmbeddingGenerator. |
Any update on this? |
There is a solution above, Embeddings = false! |
@aropb Where should "Embeddings = false" be added? There does not seem to be the method WithLLamaSharp in LLamaSharp.KernelMemory project. Thanks. |
It is indicated above where he is |
It seems that for models that support ChatCompletion and Embeddings, the new version must configure Embeddings=false in order to use ChatCompletion properly. |
not working in my case, |
You need to always set the default Embeddings = false. The error occurs when calling AskAsync. The Embedding Generator does not need to be changed (if nbatch == ubatch). |
I did what is mentioned here, see: but I had to lower the Context Size too. Currently I set it to 4000, I had 131 000 before and I was getting Access Violation with llama-3.1-8b-4k model even with this modification. I am using the same model as a chat assistant with context size 131 000 and it works. |
Description
I use KernelMemory. LogiBits is empty.
The error occurs at the time of the call:
memory.AskAsync()
Debug with clone classes: BaseSamplingPipeline, DefaultSamplingPipeline
Reproduction Steps
The error occurs at the time of the call:
MemoryAnswer answer = await memory.AskAsync(question: question, filters: filters);
Environment & Configuration
Known Workarounds
The text was updated successfully, but these errors were encountered: