You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about dealing with missing values. See below an example:
import json
from urllib import request
from glom import glom, Coalesce
url = 'https://www.ebi.ac.uk/ols/api/ontologies/efo/terms?size=200'
with request.urlopen(url) as r:
data = json.loads(r.read())
# 1: no missing values, result is a dict of lists, each 200 long
spec = {
'label': ('_embedded.terms', ['label']),
'obo_id': ('_embedded.terms', ['obo_id']),
}
result = glom(data, spec)
# 2: few missing values in "children", result is a single None
spec = {
'label': ('_embedded.terms', ['label']),
'obo_id': ('_embedded.terms', ['obo_id']),
'parents': ('_embedded.terms', ['_links.parents.href']),
'children': ('_embedded.terms', ['_links.children.href']),
}
result = glom(data, spec, default = None)
# 3: the desired result: the missing values in "children" are replaced by None's
spec = {
'label': ('_embedded.terms', ['label']),
'obo_id': ('_embedded.terms', ['obo_id']),
'parents': ('_embedded.terms', ['_links.parents.href']),
'children': (
'_embedded.terms',
[Coalesce('_links.children.href', default = None)]
),
}
result = glom(data, spec)
The third version above is a solution for me: all lists in the result are the same length, no records are dropped, and None is used in place of the missing values. However, this interface is quite inconvenient, as I would need to wrap everything into Coalesce(..., default = None). I am wondering if a better solution exists, where with one single parameter I can set the missing value handling globally?
The text was updated successfully, but these errors were encountered:
Another approach you could take is to embrace that specs are basic python data structures, and write a helper function to do the "boring stuff".
defget_paths_in_list(path_dict, default=None):
'''given a dict of {key: path}, returns a spec that fetches that path with a default from each child'''return {key: [Or(val, default=default)] forkey, valinpath_dict.items}
spec= (
'_embedded.terms',
get_paths_in_list({
'label': 'label',
'obo_id': 'obo_id',
'parents': '_links.parents.href',
'children': '_links.children.href',
})
)
Hi,
Thank you for developing this great library!
I have a question about dealing with missing values. See below an example:
The third version above is a solution for me: all lists in the result are the same length, no records are dropped, and
None
is used in place of the missing values. However, this interface is quite inconvenient, as I would need to wrap everything intoCoalesce(..., default = None)
. I am wondering if a better solution exists, where with one single parameter I can set the missing value handling globally?The text was updated successfully, but these errors were encountered: