Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-think the placement of user data (inputs, outputs, configs) #120

Open
kasnerz opened this issue Oct 14, 2024 · 1 comment
Open

Re-think the placement of user data (inputs, outputs, configs) #120

kasnerz opened this issue Oct 14, 2024 · 1 comment
Labels
enhancement New feature or request in progress Working on it!

Comments

@kasnerz
Copy link
Collaborator

kasnerz commented Oct 14, 2024

Current ideas:

  • moving datasets.yml from /config to /data so that it's closely tied to the local datasets
  • moving outputs from /outputs to /data/outputs
  • moving inputs from /data to /data/inputs
  • simplifying the directory structure for the outputs (currently located in a subdirectory /<dataset_id>/<split>/<setup_id>/files)
@kasnerz kasnerz added the enhancement New feature or request label Oct 14, 2024
@kasnerz
Copy link
Collaborator Author

kasnerz commented Oct 16, 2024

Plan for the new structure:

  • /campaigns: subdirectories for all the campaigns (unifying annotations and generations)
  • /data/inputs: datasets, i.e. what is currently under /data
  • /data/outputs: model outputs, i.e. what is currently under outputs
  • /datasets: what is now called loaders (if we don't run into clashes with Hugginface datasets package)

Moreover, we will simplify the directory structure with model outputs. The directory will be most likely divided by default into subdirectories <dataset>/<split>, but what counts is the (data, split, setup_id, example_idx) tuple the JSONL record.

@kasnerz kasnerz added the in progress Working on it! label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request in progress Working on it!
Projects
None yet
Development

No branches or pull requests

1 participant