Skip to content

Latest commit

 

History

History
13 lines (11 loc) · 1013 Bytes

README.md

File metadata and controls

13 lines (11 loc) · 1013 Bytes

cicero-datatype-generator

The Project is a starting point for examining hpw to automatically generate Cicero Data Models and appropriate data types from Cicerto templates.

This can be achieve by a combination of approaches such as:

  • Hueristics for matching variable names against a set of string patterns for commonly used patterns that resemble common strings found in data/times/currency/durations.
  • Natural Language Processing (NLP) in particular, Language models capable of Named Entity Extraction (NER).
  • Using Weak Labelling as defined in https://www.snorkel.org/ using a combination of:
  • Weak Labeling Functions (LFs) capable of using a combination of:
    • pattern matching using string/regular expressions for common used patterns of variable names
    • lookup of data in a list of existing data models from sources such as the Accord Project library at https://templates.accordproject.org/

Usage

extractDataTypes.py --outputfile optionalPathToOuputConcertoFile.cto pathToInputCiceroFile.md