Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

richer responses in the reconciliation API for more NER-like queries #61

Open
epaulson opened this issue Dec 14, 2020 · 2 comments
Open

Comments

@epaulson
Copy link

We've been exploring using the Reconciliation API in a smart building/smart grid/IoT setting: we have sensor names (and only sensor names) from legacy industrial equipment and we want to predict what class they match in our ontology. For example, a sensor/controller with a name like
room101-tstat01-htsp

has type https://brickschema.org/schema/1.1/Brick#Heating_Temperature_Setpoint

in our ontology.

We've had good luck using OpenRefine + the reconciliation API to process data like this - we can send this name off to API and get back a list of possible matches - that example is probably the heating temp setpoint, but maybe it's just a https://brickschema.org/schema/1.1/Brick#Heat_Sensor, and OpenRefine gives us a nice UI to give those options back to the user and let them choose.

Predicting class is the most important thing we do, and the API is fine for that: we return back a response that's like: {'id': 'room101-tstat01-htsp', 'name': 'https://brickschema.org/schema/1.1/Brick#Heating_Temperature_Setpoint'}

We can do this with string matching with reasonably good results, and we've also been exploring using language models trained on labeled examples. (Note that we just send back the query as the ID, we don't have a database that we're looking things up, and using 'name' seems to work best in OpenRefine for getting this classification into a column)

But another thing we're interested in doing is extracting other metadata that's in that sensor name. If everything is sensibly named OpenRefine is awesome for this - in our example it's easy split columns and to organize that sensor name into the room it's located in and the thermostat that it's attached to.

But our NLP parser can and already is doing this, and it'd be really awesome if as part of our reconciliation API response we could include a JSON struct that included other data that our service pulled out. The OpenRefine UI should ignore it, but with a GREL expression I could pull it out - something like

{
'id': 'room101-tstat01-htsp', 
'name': 'https://brickschema.org/schema/1.1/Brick#Heating_Temperature_Setpoint',
'extra': { 
     'enclosingSpace': {
         'id': 'room101',
         'type': 'https://brickschema.org/schema/1.1/Brick#Room'
     },
     'enclosingEquip': {
        'id': 'tstat01',
        'type': 'https://brickschema.org/schema/1.1/Brick#Thermostat',
     }
 }
}

obviously everything in 'extra' is service-dependent; what our service puts in will be very different than say OpenCorporates

We can't go fetch properties because we're not matching against a different database - I don't have a database that lists 'room101-tstat01-htsp', I'm just parsing it and predicting from it.

I don't think we can use the data-extension service, because we need to know which of the original query responses the user reconciled the "id" to - we don't have a database of IDs and we'd prefer not to have to create them to store predictions.*

Maybe that makes it too far afield from what the Reconciliation API spec is intended for, and it's cool if this isn't something you're interested in complicating the spec with, but we'd love to talk more about how we could enable this and to see if it's useful beyond what we're doing.

Our very basic API implementation is here: https://github.com/BrickSchema/reconciliation-api

Thanks!

*as an aside, it would be cool if there was a way for OpenRefine to report back which reconciliation candidate a user chose for each query, because it'd be great training feedback if you're doing some sort of ML-based prediction, and that'd change the tradeoffs for creating and storing IDs for queries.

@wetneb
Copy link
Member

wetneb commented Dec 15, 2020

Thanks for the detailed use case, it is very interesting!

There has been interest before in letting services return property values in reconciliation candidates: #48 (comment).

For your last point, there is #30.

@epaulson
Copy link
Author

Thanks. I think the best thing for us to do is to put together a quick prototype and see what folks think? I'm assuming that OpenRefine won't freak out if there's an unexpected new object in the response but that will be easy to test. Would love to hear from other folks creating reconciliation api implementations to see how they might use this.

In looking at the data extension service, we may still explore that - the UI/Workflow in OpenRefine is really nice for that service (see the gifcast here: https://github.com/OpenRefine/OpenRefine/wiki/Data-Extension-API ) so even though it adds a bunch more state at the server for us to support it I think might be worth considering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants