-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add product conversion tests #53
base: add-complex-tests
Are you sure you want to change the base?
Conversation
@ivankivanov could you please have a look at the missing fields and see if these can be mapped back to our schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @renancunha , you're off to a great start here! Thanks for putting this out. It really helps visualize and see where we're currently at the moment in terms of conversions. 🎉
In addition to the three (3) main points you've raised, I have the following food for thought (that might also address 2 and 3):
So right off the bat we could actually see that:
- all data conversions starts with
jmespath
- almost all data conversion ends with
first
- there was a need to transform a
number
into astring
I was thinking that we could avoid the superfluous and unnecessary operations above and have those type-conversions completely implicit. One way to achieve this is by defining the schema of the item transformation we want.
Such schema would contain:
- the data type
- the jmespath of the data source which complies with the Unified Schema
As a quick example, suppose we only need to retrieve a small subset of fields from the unified_schema_product.json
you've used:
{
"price_now": {
"type": "str",
"src": "offers[0].price"
},
# No need to do a `map(lambda x: str(x))`
"rating_current": {
"type": "str"
"src": "aggregateRating.ratingValue",
},
"reviews_quantity_1_star": {
"type": "str",
# This actually would return: [4]
# but we don't need to
# - pipe jmespath with "... | [0]",
# - do a `map(lambda x: str(x))`,
# - and end it with a `first`
# because the conversion module we'll be create
# would handle it based on the type
"src": "ratingHistogram[?ratingValue=='1'].ratingPercentage"
},
"image_urls": {
"type": "List[str]",
# Similar to PEP-585, we can specify the expected elements
# of any iterables. The conversion module would be able to
# adapt to it automatically.
"src": "images"
}
}
Notice that we didn't need to write jmespath
and first
in any of them. :D
The doesn't actually solve the 1st concern you've raised which tackles #26 . I'll need to brew that one on my brain for a few days. 🤔
In any case, the schema we'll be defining would be used as input, alongside the data source, as you can see below:
Lemme know what you think. Cheers!
Thanks for adding such great analyzes on our discussion here @BurnzZ! I agree with the points that you have mentioned. Looking at the "specifications" created to do this transformation above we see this repetitive pattern of getting the source field with Some considerations:
By doing this will have some kind of separation of concerns: the I think having this three info (type, src, and rules) for each field, our module will be capable of handling every single case that came to my mind now. |
The third part would be a great addition @renancunha ! You're absolutely right about the separation of concerns.
What do you think about calling it So far, we have established a clear split among:
I'm trying to think of anything else we might add, but I think these are the fundamental ones. Anything else might just be special or edge cases for some specific use cases. Do you think the next step would be for you to try out implementing a basic PoC for this "specifications" we've been discussing? 😄 |
Naming it as
Yeah, I think that I can start to work on a PoC for creating this module responsible to parse a schema file (as a tree of fields), field by field, and executing the steps that we've discussed here (get the My question is where should I start to implement this PoC? It would be great to implement it in a repository where we can follow the progress of it (like we do here). |
Similar to #52, this PR should not be merged into master. It was created to try to come closer to a real-world scenario, where we use the shublang to do data conversion between schemas.
Initially, a specification (a set of rules) was created to transform the data contained at the file
tests/resources/unified_schema_product.json
into a custom product schema that' is used at BV. We tried to cover all the fields present at the custom schema, according to the availability of data at the source schema.Below are some considerations:
We should start to think about a functionality to solve Convert Free Form Text to Enum #26. It's very common to have to map a free form text to a set of predefined values (in Enum). In this specific example, we have to map the
currency
symbols (like£
) to thecurrency
abbreviations (likeGBP
), also had to map the stock situation (InStock
,OutOfStock
, etc). Do this kind of thing manually, usingmap
, could get really weird, not-readable, and hard to maintain.We need a pipe to convert data to string. There are some fields that are integers at the unified-schema product but should be strings at the customer schema, like all the fields related to reviews.
When we do some operation of filtering or something that results in a list with jmespath, we need to use the
first
twice to get exactly the data that we want. As an example, when we try to get a determined property from theadditionalProperties
list, we can do that in two ways:jmespath("additionalProperty[?name=='colour'].value | [0]") | first
orjmespath("additionalProperty[?name=='colour'].value") | first | first
In these cases, we will need some kind of convention on which approach we should adopt.
It would be great to have discussions about the expressions created, possible optimizations, different approaches to get the same results, expressions complexity, etc.