AttributeError: 'DatasetDict' object has no attribute 'features' #634

vahuja4 · 2023-09-26T09:32:59Z

vahuja4
Sep 26, 2023

I just began dabbling with LLMs and came across axolotl. Thought of using it to fine-tune a llama2 model for this dataset https://huggingface.co/datasets/knowrohit07/know_sql

I just used one of the examples in the llama2 examples directory and changed the path of the dataset. The changed configuration files is attached. Note: I had to change the type of the file to txt to be able to upload it here.
sql.txt

The error trace looks like:
workspace/axolotl/examples/llama-2# accelerate launch -m axolotl.cli.train sql.yml The following values were not passed to accelerate launchand had defaults used instead:--num_processeswas set to a value of1 --num_machineswas set to a value of1 --mixed_precisionwas set to a value of'no' --dynamo_backendwas set to a value of'no'To avoid this warning pass in values for each of the problematic parameters or runaccelerate config. dP dP dP 88 88 88 .d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88 88' 88 8bd8' 88' 88 88 88' 88 88 88 88. .88 .d88b. 88. .88 88 88. .88 88 88 88888P8 dP' dP 88888P' dP `88888P' dP dP

[2023-09-26 09:25:22,624] [INFO] [axolotl.normalize_config:89] [PID:3099] [RANK:0] GPU memory usage baseline: 0.000GB (+0.319GB misc)
[2023-09-26 09:25:22,870] [DEBUG] [axolotl.load_tokenizer:75] [PID:3099] [RANK:0] EOS: 2 /
[2023-09-26 09:25:22,870] [DEBUG] [axolotl.load_tokenizer:76] [PID:3099] [RANK:0] BOS: 1 /
~~[2023-09-26 09:25:22,870] [DEBUG] [axolotl.load_tokenizer:77] [PID:3099] [RANK:0] PAD: 2 /~~
[2023-09-26 09:25:22,870] [DEBUG] [axolotl.load_tokenizer:78] [PID:3099] [RANK:0] UNK: 0 /
[2023-09-26 09:25:23,077] [INFO] [axolotl.load_tokenized_prepared_datasets:132] [PID:3099] [RANK:0] Unable to find prepared dataset in last_run_prepared/fb6a3378485bc3affe8ffc9041630c7b
[2023-09-26 09:25:23,077] [INFO] [axolotl.load_tokenized_prepared_datasets:133] [PID:3099] [RANK:0] Loading raw datasets...
[2023-09-26 09:25:23,077] [INFO] [axolotl.load_tokenized_prepared_datasets:138] [PID:3099] [RANK:0] No seed provided, using default seed of 42
/usr/local/lib/python3.10/dist-packages/datasets/load.py:2089: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.
You can remove this warning by passing 'token=None' instead.
warnings.warn(
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 36, in
fire.Fire(do_cli)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 29, in do_cli
dataset_meta = load_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
File "/workspace/axolotl/src/axolotl/cli/init.py", line 222, in load_datasets
train_dataset, eval_dataset, total_num_steps = prepare_dataset(cfg, tokenizer)
File "/workspace/axolotl/src/axolotl/utils/data.py", line 62, in prepare_dataset
train_dataset, eval_dataset = load_prepare_datasets(
File "/workspace/axolotl/src/axolotl/utils/data.py", line 470, in load_prepare_datasets
dataset = load_tokenized_prepared_datasets(
File "/workspace/axolotl/src/axolotl/utils/data.py", line 238, in load_tokenized_prepared_datasets
"input_ids" in ds.features
AttributeError: 'DatasetDict' object has no attribute 'features'
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 986, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python', '-m', 'axolotl.cli.train', 'sql.yml']' returned non-zero exit status 1.`

What could I be doing wrong?

winglian · 2023-09-26T13:56:57Z

winglian
Sep 26, 2023
Maintainer

@vahuja4 It's because that particular dataset doesn't have a train split (which axolotl expects). You could download this file locally (https://huggingface.co/datasets/knowrohit07/know_sql/blob/main/know_sql_val3%7Bign%7D.json), and then point to it with the yml. You might have to rename the file so it doesn't have val in the filename.

2 replies

vahuja4 Sep 26, 2023
Author

@winglian - thank you for your reply. Please see below:
ds = datasets.load_dataset('knowrohit07/know_sql', revision='f33425d13f9e8aab1b46fa945326e9356d6d5726')
ds
DatasetDict({ train: Dataset({ features: ['context', 'answer', 'question'], num_rows: 78562 }) })

So, the train split does seem to be there.

winglian Sep 26, 2023
Maintainer

looking at the datasets viewer on hf on shows a validation split.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'DatasetDict' object has no attribute 'features' #634

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

AttributeError: 'DatasetDict' object has no attribute 'features' #634

vahuja4 Sep 26, 2023

Replies: 1 comment · 2 replies

winglian Sep 26, 2023 Maintainer

vahuja4 Sep 26, 2023 Author

winglian Sep 26, 2023 Maintainer

vahuja4
Sep 26, 2023

Replies: 1 comment 2 replies

winglian
Sep 26, 2023
Maintainer

vahuja4 Sep 26, 2023
Author

winglian Sep 26, 2023
Maintainer