-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support foreign keys property #84
Comments
Hi @ChasNelson1990, thanks for your idea and it makes sense to have this feature. We are currently working in a new update for the ckanext-validation that is going to be released still this week, then I would suggest you to first wait for the PR to be merged before starting working on it. Regarding how to implement it, I still need to think a bit on the best approach but I'm prone to the idea no. 1. Let's see what @amercader has to say about it. |
@ChasNelson1990 this certainly sounds interesting and I'm also leaning towards option 1, but I'm a bit unclear on how we would bundle both tables. Off the top of my head maybe something like this could work:
Things to consider:
@ChasNelson1990 let me know if this makes sense or I missed something |
Thanks for the replies guys. I think everything makes sense and it feels like we are on the same page. For some greater context @amercader here is what Fjelltopp do in our forks. We have something like the following in our schema: "foreignKeys": [
{
"fields": "area_id",
"reference": {
"resource": "3_geojson_inputs",
"fields": "area_id"
}
}
], Where If I am remembering correctly this may depend on some changes in our fork of ckanext-scheming allowing us to define a package schema with required resources and point to unique schemas for each resource, e.g. https://github.com/fjelltopp/unaids_data_specifications/blob/eb97a7d956936e3bfa6f3fe36d869249a8c4c525/package_schemas/country_estimates/3_country_estimates_23.json#L125. Those fork changes should not be necessary to enable foreign key validation here though. We then check if the schema has As you say, we could just provide a repeatable identifier that allows us to infer the resource and pass a 'virtual' data package to frictionless.
When we do a resource validation with frictionless it does a foreign key check and throws a few errors we could catch: https://github.com/frictionlessdata/framework/blob/ec2d8336dfa32f2396b38e4f33e3a7e9e4ff415a/frictionless/resource/resource.py#L718. This would cover a resource not being in the package, e.g. because it has been deleted? We would then want to return that as a "Warning" I would guess as that seems to be where we keep framework issues as opposed to validation issues? And, while I am not sure I can find the line, I would presume it validates the schema as part of validating a resource? Because schema validation throws a known error if wrong fields are defined: https://github.com/frictionlessdata/framework/blob/ec2d8336dfa32f2396b38e4f33e3a7e9e4ff415a/frictionless/schema/schema.py#L350. Again this could be caught and returned as a warning.
This I have no idea about - is there an easy way to test / explore this? |
Hello @amercader and @aivuk . I'm working togheter with @ChasNelson1990 on the issue. I'd like to bump the issue a little bit and make sure we're on the same page about:
|
@tomeksabala Yes, option 1 is the more sensible and @ChasNelson1990 comments make sense, although the part that is less clear to me is the "get the appropriate data and pass it to goodtables". IIUC the proposed approach, we would somehow create a virtual datapackage that links to all tables involved as resources and let frictionless deal with the data loading. But as you say all this would be much easier to discuss with actual code so a draft PR would be great. With regards to #83 we are actively working on it so it's hard to give a timeline but the goal is to finish the uploader work in the next couple of weeks, merge to master and then carry out further work (schema editor ec) as new feature branches. In any case most of the work there affects the UI and custom endpoints, we are not touching the jobs module which is where I expect most of your work to be done so you can work of current master if you want. |
Hi @amercader and @aivuk I'm one of the Fjelltopp team that is working on this feature. We've spent sometime thinking this through and we'd like to get your opinion before we write any code. The plan is as follows:
We'd be really grateful to know if it makes sense for you :). |
Hi @amercader @aivuk - have you guys seen the PR I created for this - #86? Would be great to get some feedback and thoughts. |
Sorry @ChasNelson1990 , we have very limited bandwidth right now but I'll try to look at this probably next week |
No worries @amercader - we appreciate bandwidth issues! Next week would be great :-) |
Context:
foreignKeys
prop (https://specs.frictionlessdata.io/table-schema/#foreign-keys) in frictionless to ensure that the choices in said column are present, and only present, in the geographical regions resource.Good news:
Less good news:
Idea No. 1:
foreignKeys
and, if found, bundle both tables and both Table Schemas and send it all to frictionless'svalidate
withtype=package
, e.g. we could have a_validate_data_package
to match https://github.com/frictionlessdata/ckanext-validation/blob/9c23581d34289536139cc81f7b7ddb756dab64f6/ckanext/validation/jobs.py#L139Idea No. 2:
package_validation_run
and in these actions we treat the whole CKAN dataset (plus schemas) as a frictionless Data PackageWe have a small amount of time to dedicate to this and would like to make a change that can be merged here (rather than maintaining our own fork) and so we'd really appreciate thoughts and ideas from existing maintainers and contributors.
All thoughts welcome 😄
The text was updated successfully, but these errors were encountered: