-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: DataProcessor v1 #381
base: main
Are you sure you want to change the base?
feat: DataProcessor v1 #381
Commits on Nov 29, 2024
-
Move test datasets to tests/artifacts/testdata instead of tests/data
Signed-off-by: Dushyant Behl <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 63e1472 - Browse repository at this point
Copy the full SHA 63e1472View commit details -
Add initial implementation of dataloader v1
Signed-off-by: Dushyant Behl <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5245166 - Browse repository at this point
Copy the full SHA 5245166View commit details -
tests: reformat mock.patch to inside unit tests
Signed-off-by: Will Johnson <[email protected]> fmt Signed-off-by: Will Johnson <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 50dd7fe - Browse repository at this point
Copy the full SHA 50dd7feView commit details -
Add data config argument to data preprocessor
Signed-off-by: Dushyant Behl <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ac17ebb - Browse repository at this point
Copy the full SHA ac17ebbView commit details -
fix: Changes to support current implementation
Signed-off-by: Abhishek <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fe25b48 - Browse repository at this point
Copy the full SHA fe25b48View commit details -
Ensure data handling is done within process dataargs
Removes unused dead code after adding the new framework and refactors some test cases and files. Signed-off-by: Dushyant Behl <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for dcd3f97 - Browse repository at this point
Copy the full SHA dcd3f97View commit details -
Remove accelerator in favor of torch distributed check for multi node
data preprocessing Signed-off-by: Dushyant Behl <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7adfeb0 - Browse repository at this point
Copy the full SHA 7adfeb0View commit details -
Refactor data util tests as data handler tests.
Signed-off-by: Dushyant Behl <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2546733 - Browse repository at this point
Copy the full SHA 2546733View commit details -
fix: add __init__.py to add tuning.data to python package
Signed-off-by: Will Johnson <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2a0f3f0 - Browse repository at this point
Copy the full SHA 2a0f3f0View commit details -
fix: multi GPU prepare training dataset
Signed-off-by: Will Johnson <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0338634 - Browse repository at this point
Copy the full SHA 0338634View commit details -
Signed-off-by: Will Johnson <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5e994ba - Browse repository at this point
Copy the full SHA 5e994baView commit details -
Signed-off-by: Will Johnson <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 507f08e - Browse repository at this point
Copy the full SHA 507f08eView commit details -
test: add test for process_dataset_configs in HFBasedDataPreProcessor
Signed-off-by: Will Johnson <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0aa253b - Browse repository at this point
Copy the full SHA 0aa253bView commit details -
Signed-off-by: Abhishek <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9456b73 - Browse repository at this point
Copy the full SHA 9456b73View commit details -
fix: update function name get_dataprocessor->get_datapreprocessor
Signed-off-by: Will Johnson <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 668653e - Browse repository at this point
Copy the full SHA 668653eView commit details -
Signed-off-by: Dushyant Behl <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4f882f3 - Browse repository at this point
Copy the full SHA 4f882f3View commit details -
data folders should be together
Signed-off-by: Dushyant Behl <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7621173 - Browse repository at this point
Copy the full SHA 7621173View commit details
Commits on Dec 2, 2024
-
Add code comments and make code path clearer.
Remove packing check as packing support for pretokenised data is merged to trl. See huggingface/trl#2011 Signed-off-by: Dushyant Behl <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e045ca7 - Browse repository at this point
Copy the full SHA e045ca7View commit details