Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor dataset handling to support channel-specific data #2456

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DeepWaved
Copy link

@DeepWaved DeepWaved commented Nov 15, 2024

This commit modifies the code in swift/llm/utils/dataset.py to add support for channel-specific data in the dataset. The standard_keys dictionary now includes the key 'channel'. Additionally, the code in swift/llm/sft.py has been updated to handle channel-specific data during training. The val_dataset is now split into separate datasets based on the channel, stored in the channel_dataset_dict dictionary.

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Write the detail information belongs to this PR.
support for channel-specific data in the val-dataset.
Split valdataset into dictionaries based on channel.

Experiment results

image

This commit modifies the code in `swift/llm/utils/dataset.py` to add support for channel-specific data in the dataset. The `standard_keys` dictionary now includes the key `'channel'`. Additionally, the code in `swift/llm/sft.py` has been updated to handle channel-specific data during training. The `val_dataset` is now split into separate datasets based on the channel, stored in the `channel_dataset_dict` dictionary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant