Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets.utils.info_utils.ExpectedMoreSplits: {'validation'} #286

Open
SDcodehub opened this issue Jan 10, 2024 · 1 comment
Open

datasets.utils.info_utils.ExpectedMoreSplits: {'validation'} #286

SDcodehub opened this issue Jan 10, 2024 · 1 comment

Comments

@SDcodehub
Copy link

╰─$ python llama.py /datadrive/models/Llama-2-13b-chat-hf c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors /datadrive/models/Llama-2-13b-chat-hf-gptq/llama-2-13b-4bit-gs128.safetensors

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.41it/s]
/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:389: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:394: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
Downloading and preparing dataset None/en to file:///home/FRACTAL/sagar.desai/.cache/huggingface/datasets/allenai___json/en-ec45c889631c3c39/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 6413.31it/s]
Extracting data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1855.89it/s]
Traceback (most recent call last):
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/llama.py", line 488, in <module>
dataloader, testloader = get_loaders(args.dataset, nsamples=args.nsamples, seed=args.seed, model=args.model, seqlen=model.seqlen)
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/utils/datautils.py", line 189, in get_loaders
return get_c4(nsamples, seed, seqlen, model)
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/utils/datautils.py", line 64, in get_c4
traindata = load_dataset('allenai/c4', 'allenai--c4', data_files={'train': 'en/c4-train.00000-of-01024.json.gz'}, split='train', use_auth_token=False)
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/load.py", line 1797, in load_dataset
builder_instance.download_and_prepare(
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/builder.py", line 890, in download_and_prepare
self._download_and_prepare(
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/builder.py", line 1003, in _download_and_prepare
verify_splits([self.info](http://self.info/).splits, split_dict)
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/utils/info_utils.py", line 91, in verify_splits
raise ExpectedMoreSplits(str(set(expected_splits) - set(recorded_splits)))
datasets.utils.info_utils.ExpectedMoreSplits: {'validation'}

working on A100.
tried with different datasets version from 2.10.* to 2.12.*

getting same error

@iibw
Copy link

iibw commented Jan 16, 2024

This error seems to have happened because c4 was updated with some datasets configuration options which aren't supported in older versions of datasets.

To fix, upgrade datasets with pip install -U datasets and remove , 'allenai--c4' from all four c4 load_dataset lines in GPTQ-for-LLaMa/utils/datautils.py.

Some additional info here https://huggingface.co/datasets/allenai/c4/discussions/7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants