Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Addition of test cases for the Data Preprocessor code #4

Open
wants to merge 30 commits into
base: dataloader-v2-impl
Choose a base branch
from

Conversation

Abhishek-TAMU
Copy link

Description of the change

Addition of unit test cases for function: load_dataset and function _process_dataconfig_file

Related issue number

https://github.ibm.com/ai-foundation/watson-fm-stack-tracker/issues/1428

How to verify the PR

Running test cases

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

dushyantbehl and others added 26 commits November 8, 2024 23:14
Signed-off-by: Dushyant Behl <[email protected]>
Co-authored-by: Will Johnson <[email protected]>
Signed-off-by: Dushyant Behl <[email protected]>
Signed-off-by: Will Johnson <[email protected]>

fmt

Signed-off-by: Will Johnson <[email protected]>
…-impl-unit-fix

tests: reformat `mock.patch` to inside unit tests
Signed-off-by: Dushyant Behl <[email protected]>
Signed-off-by: Dushyant Behl <[email protected]>
Signed-off-by: Will Johnson <[email protected]>

fmt

Signed-off-by: Will Johnson <[email protected]>
Removes unused dead code after adding the new framework and refactors
some test cases and files.

Signed-off-by: Dushyant Behl <[email protected]>
…pl' into data_preprocessor_dushyant

Signed-off-by: Abhishek <[email protected]>
Signed-off-by: Abhishek <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Copy link

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@github-actions github-actions bot added the test label Nov 25, 2024
Copy link

@willmj willmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cases look good, one thing before approval:

Comment on lines 39 to 49
BASE_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), "../.."))
PREDEFINED_DATA_CONFIGS = os.path.join(BASE_DIR, "examples", "predefined_data_configs")
APPLY_CUSTOM_TEMPLATE_YAML = os.path.join(
PREDEFINED_DATA_CONFIGS, "apply_custom_template.yaml"
)
PRETOKENIZE_JSON_DATA_YAML = os.path.join(
PREDEFINED_DATA_CONFIGS, "pretokenized_json_data.yaml"
)
TOKENIZE_AND_INSTRUCTION_MASKING_YAML = os.path.join(
PREDEFINED_DATA_CONFIGS, "tokenize_and_instruction_masking.yaml"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go in an __init__.py file like the one for tests/testdata?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea make sense. Thanks Will. Pushed the refactored code.

Copy link

@willmj willmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks Abhishek

@dushyantbehl dushyantbehl force-pushed the dataloader-v2-impl branch 3 times, most recently from 826463f to e045ca7 Compare December 2, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants