Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Successful Run But No Stories Generated! #84

Open
jvel07 opened this issue Dec 3, 2024 · 1 comment
Open

Successful Run But No Stories Generated! #84

jvel07 opened this issue Dec 3, 2024 · 1 comment

Comments

@jvel07
Copy link

jvel07 commented Dec 3, 2024

Hi, @e-p-armstrong, thanks for the repo!

I am trying to use the rptoolkit using the same samples in raw_txt_input but I am getting:

REALLY BAD EXCEPTION ENCOUNTERED: 'NoneType' object is not subscriptable 
TypeError: 'NoneType' object is not subscriptable

And no stories are generated in the output_final files.
Here is my config.yaml (tested these models with openAI ollama calls like here, everything works fine on this front):

API:
  API_KEY_A: key
  API_KEY_B: key2
  BASE_URL_A: http://localhost:11434/v1/
  BASE_URL_B: http://localhost:11434/v1/
  LOGICAL_MODEL_A: llama3.1:8b
  LOGICAL_MODEL_B: llama3.2
PATH:
  DEFAULT_PROMPTS: ./prompts
  INPUT: ./raw_txt_input
  OUTPUT: ./output
  PROMPTS: ./prompts
PHASES:
  PHASE_INDEX: 2
  WORK_IN_PHASES: True
SYSTEM:
  COMPLETION_MODE: False
  CONCURRENCY_LIMIT: 20
  EMOTIONS: ['DOMINANCE', 'FEARLESSNESS', 'EMBARASSMENT', 'NIHILISM',
    'DETERMINATION', 'DESPERATION', 'LOSS', 'NOSTALGIA', 'ANTICIPATION',
    'TRUST', 'FEAR', 'DISORIENTATION', 'DEGRADATION']
  INCLUDE_CHUNK_IN_PROMPT: True
  MODE_A: api
  MODE_B: api
  PICK_EMOTION: True
  RP_PROMPT_END: ''
  RP_PROMPT_START: ''
  STOP: True
  SUBSET_SIZE: 3
  USE_MIN_P: True
  USE_SUBSET: False
  CHUNK_SIZE: 2000
SCRAPING:
  USE_LIGHTNOVELCO: False
  LNCO_BASE_URL: https://www.lightnovelworld.co
  LNCO_RANKING_URL: https://www.lightnovelworld.co/ranking
  LNCO_CHAPTER_COUNT: 5
  LNCO_NOVEL_COUNT: 5
  LNCO_WAIT_TIME: 10
  LNCO_MAX_WORKERS: 5

Final output:

================== ALL DATA WRITTEN!! HERE ARE YOUR STATS: ==================

Total stories generated: 0
Stories that are at least OK across the board, but might slightly flawed ('good' and above, according to the AI rater): 0
Stories that are highly rated by the AI across the board ('incredible' and above, according to the AI rater.): 0
Total tokens of all stories (roughly equivalent to the number of training tokens):  0
Time taken: 121.87675976753235 seconds
ShareGPT-format .json export is created, and the full dataset is also available in the final_outputs folder.
Hmm... No stories were generated. Check the logs for more information, and consider creating an issue if this is unexpected. If you do make an issue, please include your input data and the logs!

=============================================================================

FULL OUTPUT (pasted on pastejustit for better visual clarity):
https://pastejustit.com/j36pmf1imm

Please, let me know whether I am missing something that may be causing this issue.

@e-p-armstrong
Copy link
Owner

Hey thanks for using Augmentoolkit!

This looks like an issue with the model that you're running RPToolkit with -- it's getting the output format of the early steps wrong, so the regex parsing is throwing an error. I really need to improve the error logging to make this a bit more clear. Either way, to fix this, consider running the pipeline with a larger or smarter model? For instance, in the past I have used llama 3 70b for the early steps of this pipeline. I don't think that's the minimum requirement, but starting there and going down can't hurt. Also different finetunes differ in intelligence, finding one that follows formats well is key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants