Skip to content

Improved Attributes, CI/CD, and more

Compare
Choose a tag to compare
@JosDenysGitHub JosDenysGitHub released this 28 Jun 08:57
· 503 commits to master since this release

This release rolls up a large number of changes applied since the first full v1.0 release:

  • Extended support for semantic attributes
  • Many improvements to the language models, especially English, Japanese and Czech
  • Enhancements to the CI/CD procedures' speed and reliability
  • Enhancements to user and developer documentation
  • Various bugfixes to previously reported issues

⚠️ the output format for sentence attributes with property values has changed slightly - see below for details

Semantic Attributes

The v1.1 release significantly expands iKnow's ability to identify semantic attributes in natural language text, and in particular enhances support for measurements, time and certainty. iKnow now recognizes more markers in the various supported languages and has more accurate expansion rules to identify the affected span within each sentence. Check the wiki for more details on which attributes are supported in which language.

New since v1.0 is the introduction of a Certainty attribute, which has an attribute property expressing the level of certainty. A level of 9 means an expression of absolute certainty and a level of 1 means very low confidence. While you can specify (or override) an initial level of certainty with the attribute marker definition (e.g. in the User Dictionary), rules processing may modify the value, e.g. in the context of a Negation Attribute.

This release also introduces three new Generic attributes, which can be used by developers to tag use case specific attributes not covered by the built-in attribute types. Developers can add their own marker terms for these to leverage attribute expansion to flag syntactically "affected" portions of a sentence. A basic set of expansion rules are included for these generic attributes.
For example, we've helped customers in the healthcare industry add marker terms such as "mother", "brother", etc. so that mentions of "family history" can be identified in the text: "Patient mentioned mother suffered a stroke 10y ago, but denied experiencing chest pain himself"

CI/CD Pipeline

The Continuous Integration / Continuous Deployment pipeline for this repository is implemented through GitHub Actions, and now includes standard unit tests as well as reference tests against a gold standard to ensure the highest quality output.

Compatibility Notes

We made a change to the Sentence attribute structure emitted by the iknowpy module. In v1.0, the fixed number of properties (value, unit, value2, unit2) has been converted to a list of pairs, enabling a more flexible way of passing sentence attribute properties:

    struct Sent_Attribute:
           Attribute type "type_"
           size_t offset_start "offset_start_", offset_stop "offset_stop_"
           string marker "marker_"
           string value "value_", unit "unit_", value2 "value2_", unit2 "unit2_"
           Entity_Ref entity_ref
           Path entity_vector

was changed to :

   ctypedef vector[pair[string, string]] Sent_Attribute_Parameters
   struct Sent_Attribute:
           Attribute type "type_"
           size_t offset_start "offset_start_", offset_stop "offset_stop_"
           string marker "marker_"
           Sent_Attribute_Parameters parameters "parameters_"
           Entity_Ref entity_ref
           Path entity_vector

Existing code should change as follows :

sent_attribute['value'] = sent_attribute['parameters'][0][0]
sent_attribute['unit'] = sent_attribute['parameters'][0][1]
sent_attribute['value2'] = sent_attribute['parameters'][1][0]
sent_attribute['unit2'] = sent_attribute['parameters'][1][1]