Refactor dataset handling to support channel-specific data #2456

DeepWaved · 2024-11-15T03:54:32Z

This commit modifies the code in swift/llm/utils/dataset.py to add support for channel-specific data in the dataset. The standard_keys dictionary now includes the key 'channel'. Additionally, the code in swift/llm/sft.py has been updated to handle channel-specific data during training. The val_dataset is now split into separate datasets based on the channel, stored in the channel_dataset_dict dictionary.

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Write the detail information belongs to this PR.
support for channel-specific data in the val-dataset.
Split valdataset into dictionaries based on channel.

Experiment results

This commit modifies the code in `swift/llm/utils/dataset.py` to add support for channel-specific data in the dataset. The `standard_keys` dictionary now includes the key `'channel'`. Additionally, the code in `swift/llm/sft.py` has been updated to handle channel-specific data during training. The `val_dataset` is now split into separate datasets based on the channel, stored in the `channel_dataset_dict` dictionary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor dataset handling to support channel-specific data #2456

Refactor dataset handling to support channel-specific data #2456

DeepWaved commented Nov 15, 2024 •

edited

Loading

Refactor dataset handling to support channel-specific data #2456

Are you sure you want to change the base?

Refactor dataset handling to support channel-specific data #2456

Conversation

DeepWaved commented Nov 15, 2024 • edited Loading

PR type

PR information

Experiment results

DeepWaved commented Nov 15, 2024 •

edited

Loading