Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 token generation in story mode #49

Open
Hotohori opened this issue Jun 10, 2023 · 2 comments
Open

1 token generation in story mode #49

Hotohori opened this issue Jun 10, 2023 · 2 comments

Comments

@Hotohori
Copy link

Hotohori commented Jun 10, 2023

I use WizardLM-7B-uncencored-GPTQ, pygmalion-7b-4bit-128g-cuda, pygmalion-13b-4bit-128g and PygmalionCoT-7b. All are based on LLaMA and with all I have the same problem:

All 3-5 generation there is only 1 token generated.

When that happens, sometimes after pressing "Submit" several times it generates something, but that didn't help every time and I need to change something in the chat history that it generates more than 1 token. Sometimes a simple space on the end helps, often a new line helps. But sometimes the AI generates complete crap, using directly something from the beginning of the context to generate a own genre tag or author's note tag instead continue the context.

When I switch the model to a other LLaMA based model on a position in the story where it only was generating 1 token, the other model also only generates 1 token. If I use Pygmalion-6b-4bit-128g, a model that is not based on LLaMA, it generates normally. So it looks like it is a problem with models based on LLaMA only.

I have that problem since a long time now and already did a complete fresh KAI installation. Nothing helped so far. I use KAI locally under Win10.

@0cc4m
Copy link
Owner

0cc4m commented Jun 11, 2023

It most likely hits a stop token. The model thinks it's done. I don't know too much about it, unless you think it's a 4bit-issue, this isn't the right place to ask. You can try upstream in united or on Discord.

@anyezhixie
Copy link

In my personal experience of using it, the essence of the situation is that the model believes that there is nothing more to say in the current scenario, which is usually one of two things.

  1. you ask it to write something it is not trained to write.
  2. It thinks that the current scene is over.
    The former often requires you to change instructions or retry several times, and even though it is untrained, it may write something monotonous on multiple retries to get by.
    The latter requires checking the memory layer to see if it is a sentence or action that causes it to think the scene is over.
    Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants