Replies: 1 comment 2 replies
-
@vahuja4 It's because that particular dataset doesn't have a train split (which axolotl expects). You could download this file locally (https://huggingface.co/datasets/knowrohit07/know_sql/blob/main/know_sql_val3%7Bign%7D.json), and then point to it with the yml. You might have to rename the file so it doesn't have |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I just began dabbling with LLMs and came across axolotl. Thought of using it to fine-tune a llama2 model for this dataset https://huggingface.co/datasets/knowrohit07/know_sql
I just used one of the examples in the llama2 examples directory and changed the path of the dataset. The changed configuration files is attached. Note: I had to change the type of the file to txt to be able to upload it here.
sql.txt
The error trace looks like:
--num_machines
--mixed_precision
--dynamo_backend
workspace/axolotl/examples/llama-2# accelerate launch -m axolotl.cli.train sql.yml The following values were not passed to
accelerate launchand had defaults used instead:
--num_processeswas set to a value of
1was set to a value of
1was set to a value of
'no'was set to a value of
'no'To avoid this warning pass in values for each of the problematic parameters or run
accelerate config. dP dP dP 88 88 88 .d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88 88'
888bd8' 88'
88 88 88'88 88 88 88. .88 .d88b. 88. .88 88 88. .88 88 88
88888P8 dP'dP
88888P' dP `88888P' dP dP[2023-09-26 09:25:22,624] [INFO] [axolotl.normalize_config:89] [PID:3099] [RANK:0] GPU memory usage baseline: 0.000GB (+0.319GB misc)
[2023-09-26 09:25:22,870] [DEBUG] [axolotl.load_tokenizer:75] [PID:3099] [RANK:0] EOS: 2 /
[2023-09-26 09:25:22,870] [DEBUG] [axolotl.load_tokenizer:76] [PID:3099] [RANK:0] BOS: 1 /
[2023-09-26 09:25:22,870] [DEBUG] [axolotl.load_tokenizer:77] [PID:3099] [RANK:0] PAD: 2 /
[2023-09-26 09:25:22,870] [DEBUG] [axolotl.load_tokenizer:78] [PID:3099] [RANK:0] UNK: 0 /
[2023-09-26 09:25:23,077] [INFO] [axolotl.load_tokenized_prepared_datasets:132] [PID:3099] [RANK:0] Unable to find prepared dataset in last_run_prepared/fb6a3378485bc3affe8ffc9041630c7b
[2023-09-26 09:25:23,077] [INFO] [axolotl.load_tokenized_prepared_datasets:133] [PID:3099] [RANK:0] Loading raw datasets...
[2023-09-26 09:25:23,077] [INFO] [axolotl.load_tokenized_prepared_datasets:138] [PID:3099] [RANK:0] No seed provided, using default seed of 42
/usr/local/lib/python3.10/dist-packages/datasets/load.py:2089: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.
You can remove this warning by passing 'token=None' instead.
warnings.warn(
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 36, in
fire.Fire(do_cli)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 29, in do_cli
dataset_meta = load_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
File "/workspace/axolotl/src/axolotl/cli/init.py", line 222, in load_datasets
train_dataset, eval_dataset, total_num_steps = prepare_dataset(cfg, tokenizer)
File "/workspace/axolotl/src/axolotl/utils/data.py", line 62, in prepare_dataset
train_dataset, eval_dataset = load_prepare_datasets(
File "/workspace/axolotl/src/axolotl/utils/data.py", line 470, in load_prepare_datasets
dataset = load_tokenized_prepared_datasets(
File "/workspace/axolotl/src/axolotl/utils/data.py", line 238, in load_tokenized_prepared_datasets
"input_ids" in ds.features
AttributeError: 'DatasetDict' object has no attribute 'features'
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 986, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python', '-m', 'axolotl.cli.train', 'sql.yml']' returned non-zero exit status 1.`
What could I be doing wrong?
Beta Was this translation helpful? Give feedback.
All reactions