extractor: Add list based lookups #619

aditya-nambiar · 2024-12-25T01:27:43Z

Important

Add LIST_LOOKUP type to extractors for list-based lookups, update query engine, and add tests.

Behavior:
- Add LIST_LOOKUP type to ExtractorType in featureset_pb2.py and featureset_pb2.pyi.
- Update Extractor class in featureset.py to support LIST_LOOKUP.
- Modify _get_generated_extractors and _validate in featureset.py to handle list-based features.
Query Engine:
- Implement _compute_list_lookup_extractor in query_engine.py to handle list-based lookups.
- Update run_extractors in query_engine.py to process LIST_LOOKUP extractors.
Testing:
- Add tests for list-based lookups in test_featureset.py and test_invalid_featureset.py.
- Update existing tests in test_expr.py and test_complex_autogen_extractor.py to reflect changes.
Miscellaneous:
- Update pyproject.toml to include attrs dependency.

^{This description was created by}^{for bb924f2. It will automatically update as commits are pushed.}

ellipsis-dev

❌ Changes requested. Reviewed everything up to 22e839f in 1 minute and 30 seconds

More details

Looked at 779 lines of code in 12 files
Skipped 1 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. fennel/testing/query_engine.py:470

Draft comment:
Remove print statements used for debugging purposes. Consider using a logging framework if needed for debugging.
Reason this comment was not posted:
Marked as duplicate.

2. fennel/testing/query_engine.py:471

Draft comment:
Remove print statements used for debugging purposes. Consider using a logging framework if needed for debugging.
Reason this comment was not posted:
Marked as duplicate.

3. fennel/featuresets/featureset.py:704

Draft comment:
Remove print statements used for debugging purposes. Consider using a logging framework if needed for debugging.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_rEmp52xpSvgtHjqx

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2024-12-25T01:29:21Z

fennel/testing/query_engine.py

+                input_feature.name
+            ].explode()
+
+        print(repeated_timestamps)


Remove print statements used for debugging purposes. Consider using a logging framework if needed for debugging.

ellipsis-dev

👍 Looks good to me! Incremental review on 0055a20 in 41 seconds

More details

Looked at 830 lines of code in 13 files
Skipped 1 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. fennel/internal_lib/to_proto/to_proto.py:727

Draft comment:
Simplify the extractor type check by using a set for comparison.

    if extractor.extractor_type in {ExtractorType.LOOKUP, ExtractorType.LIST_LOOKUP}:

Reason this comment was not posted:
Confidence changes required: 50%
The code in fennel/internal_lib/to_proto/to_proto.py has a repeated logic for checking the extractor type. This can be simplified by using a set for comparison.

Workflow ID: wflow_DXCh8OltvKXMNLKD

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 9f87486 in 44 seconds

More details

Looked at 833 lines of code in 13 files
Skipped 1 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. fennel/client_tests/test_complex_autogen_extractor.py:349

Draft comment:
Consider using relativedelta from the dateutil library to calculate age_years more clearly and accurately.
Reason this comment was not posted:
Confidence changes required: 50%
The code in fennel/client_tests/test_complex_autogen_extractor.py uses a manual calculation for age_years that could be simplified using a library function. This would improve readability and maintainability.

2. fennel/featuresets/featureset.py:702

Draft comment:
Good use of get_origin to check for list types. This ensures proper handling of generic types.
Reason this comment was not posted:
Confidence changes required: 0%
The code in fennel/featuresets/featureset.py uses get_origin to check for list types. This is a good practice to ensure type checking for generic types.

3. fennel/testing/query_engine.py:493

Draft comment:
Consider refactoring the repeated pattern of setting dataset_lookup to improve code clarity and reduce redundancy.
Reason this comment was not posted:
Confidence changes required: 30%
The code in fennel/testing/query_engine.py has a repeated pattern for handling dataset_lookup which could be refactored for clarity.

4. fennel/testing/query_engine.py:487

Draft comment:
Good use of fillna for handling default values in dataframes. This is a standard practice for managing missing data.
Reason this comment was not posted:
Confidence changes required: 0%
The code in fennel/testing/query_engine.py uses fillna for handling default values, which is a good practice for handling missing data.

Workflow ID: wflow_ZYreki6olbI58Tb8

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on ff27874 in 43 seconds

More details

Looked at 77 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. fennel/testing/mock_client.py:530

Draft comment:
Remove the print statement in the exception handling block. Use logging if necessary.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
This is a testing/mock client file used for testing purposes. The print statement is added in an error case to help with debugging by showing the problematic dataframe before raising the exception. While using logging would be more standard, in a testing context having print statements for debugging is quite reasonable. The print provides immediately visible debug info that could be helpful during test failures.
The comment makes a valid point that logging is generally preferred over print statements in production code. Print statements can be problematic if output needs to be captured or redirected.
However, this is test code where print statements are commonly used for debugging visibility. The print statement serves a clear debugging purpose by showing the problematic dataframe before the exception.
The comment should be deleted. While it makes a technically valid point, using print for debugging in test code is reasonable and the print statement serves a clear purpose here.

Workflow ID: wflow_gMizF6igq6OgoGvD

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on bb924f2 in 23 seconds

More details

Looked at 75 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. fennel/testing/mock_client.py:530

Draft comment:
Remove the print statement inside the exception handling block to clean up the code.
Reason this comment was not posted:
Confidence changes required: 50%
The print statement in the exception handling block is not necessary for production code. It can be removed to clean up the code.

Workflow ID: wflow_BGqUV3FajGLQSBFW

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.