Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function decide checks only the first element of a list #152

Open
mikechernev opened this issue Mar 8, 2022 · 3 comments
Open

Function decide checks only the first element of a list #152

mikechernev opened this issue Mar 8, 2022 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@mikechernev
Copy link

mikechernev commented Mar 8, 2022

Disclaimer

Hey, I am not sure if this is really a bug or me not understanding how the function works. I am also relatively new to RML, so please be gentle :)

Problem

When using idlab-fn:decide on list elements it only checks the first element and completely ignores the rest.

Steps to reproduce

  1. Iterate over a list of objects that have a property which is also a list
    Example:
{
   "people": [
      {
         "id": "UniqueID",
         "name": "Mike",
         "contacts": [
            {
               "type": "phone",
               "value": "123456790"
            },
            {
               "type": "email",
               "value": "[email protected]"
            }
         ]
      }
   ]
}
  1. When iterating over the people try to get the phone and the email based on the value of contacts[*].type
    Example:
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix fnml:   <http://semweb.mmlab.be/ns/fnml#> .
@prefix fno:    <https://w3id.org/function/ontology#> .
@prefix idlab-fn: <http://example.com/idlab/function/> .
@prefix mike: <http://example.com/ontology/mike/> .
@base <http://example.com/resource/> .

<Person>
  a rr:TriplesMap;
  rml:logicalSource [
    rml:source "./mike.json";
    rml:referenceFormulation ql:JSONPath;
    rml:iterator "$.people[*]"
  ];

  rr:subjectMap [
    rr:template "http://example.com/resource/entity/{.id}";
    rr:graphMap [ rr:constant "http://example/resource/person"]
  ];

  rr:predicateObjectMap[
    rr:predicate mike:name;
    rr:objectMap [ rml:reference ".name" ]
  ];

  rr:predicateObjectMap [
    rr:predicate mike:telephone;
    rr:objectMap [
      fnml:functionValue [
        rr:predicateObjectMap [
          rr:predicate fno:executes ;
          rr:objectMap [ rr:constant idlab-fn:decide ]
        ];
        rr:predicateObjectMap [
          rr:predicate idlab-fn:str ;
          rr:objectMap [ rml:reference ".contacts[*].type" ]
        ];
        rr:predicateObjectMap [
          rr:predicate idlab-fn:expectedStr ;
          rr:objectMap [ rr:constant "phone" ]
        ];
        rr:predicateObjectMap [
          rr:predicate idlab-fn:result ;
          rr:objectMap [ rml:reference ".contacts[*].value"  ]
        ];
      ] ;
    ]
  ];

  rr:predicateObjectMap [
    rr:predicate mike:email;
    rr:objectMap [
      fnml:functionValue [
        rr:predicateObjectMap [
          rr:predicate fno:executes ;
          rr:objectMap [ rr:constant idlab-fn:decide ]
        ];
        rr:predicateObjectMap [
          rr:predicate idlab-fn:str ;
          rr:objectMap [ rml:reference ".contacts[*].type" ]
        ];
        rr:predicateObjectMap [
          rr:predicate idlab-fn:expectedStr ;
          rr:objectMap [ rr:constant "email" ]
        ];
        rr:predicateObjectMap [
          rr:predicate idlab-fn:result ;
          rr:objectMap [ rml:reference ".contacts[*].value"  ]
        ];
      ] ;
    ]
  ];
.
  1. Execute the RML file using the java mapper

Expected result

<http://example.com/resource/entity/UniqueID> <http://example.com/ontology/mike/name> "Mike" <http://example/resource/person>.
<http://example.com/resource/entity/UniqueID> <http://example.com/ontology/mike/telephone> "123456790" <http://example/resource/person>.
<http://example.com/resource/entity/UniqueID> <http://example.com/ontology/mike/email> "[email protected]" <http://example/resource/person>.

Actual result

<http://example.com/resource/entity/UniqueID> <http://example.com/ontology/mike/name> "Mike" <http://example/resource/person>.
<http://example.com/resource/entity/UniqueID> <http://example.com/ontology/mike/telephone> "123456790" <http://example/resource/person>.

Conclusion

While debugging this I decided to change the order of the elements of the .people.contacts[] list and realised that if the email is the first element it matches it, but misses the phone. To further validate this I added an element with a completely different type to be first element of the list and then neither the phone nor the email are matched. This leads me to believe that even though I am using a list, decide only checks the first element and stops the execution.

End disclaimer

Maybe I am using decide wrong or maybe I am not using the right function for the task I am trying to achieve. Any help help will be greatly appreciated.

@DylanVanAssche DylanVanAssche added the bug Something isn't working label Mar 8, 2022
@mikechernev
Copy link
Author

I did some digging in the code and it seems like this is the implemented behaviour https://github.com/RMLio/rmlmapper-java/blob/master/src%2Fmain%2Fjava%2Fbe%2Fugent%2Frml%2Ffunctions%2FFunctionModel.java#L93-L101 - any function that gets a list of elements will only use the first element of the provided list.

Is it making sense to change this to execute the function against every element instead?

@bjdmeest
Copy link
Collaborator

bjdmeest commented Mar 9, 2022

There's actually two things:

First, the decide function expects an rdf:string, not an array, so it would be better to create a new function that's a combination of the decide function and listContainsElement, or nest the listContainsElement in an if function

However, I'm not sure this would solve the actual issue: it's currently underspecified what to do with .contacts[*].type vs .contacts[*].value. I think it will not pairwise process type and value (which I assume is the expected behavior), but instead will process the type list and value list. This is, in fact, an open mapping challenge that can be handled by accessing the 'uniqueID' field in the $.people[*].contacts[*] iteration: kg-construct/mapping-challenges#20

afaict, I don't see a way to solve your specific use case currently without preprocessing the input data, however, there's a slim chance that my first suggestion does work out of the box as expected

@mikechernev
Copy link
Author

Thanks for the detailed explanation @bjdmeest. You are correct in your assumption about the mapping between the type and the value.

My assumption was that if a function accepts a string and I pass a list of strings it will iterate over the list and execute the function for each element, similarly to the way the mapping works if a reference is passed. Looking at the code that would require to change the way the functions are executed and I'm not sure if it's even possible.

That's why I already did what you initially suggested and created a new function which takes two lists (one to validate against and one to use for the result) and a string to match. That works perfect for the use case I have, so I'll probably stick with it. (Please let me know if it makes sense to contribute this with a PR, since it's as a very niche scenario that might not be valuable for anyone else).
Thanks again for all the help and the explanations :)

Cheers,
Mike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants