Skip to content

Source Code

Benjamin De Boe edited this page Feb 18, 2020 · 4 revisions

Modules

The C++ source code is organized in a set of "modules", each in their own folder :

  • aho
  • ali
  • base
  • core
  • engine
  • enginetest
  • shell

aho

Lookup of text uses the Aho-Corasic algorithm, it basically is a state machine for very fast key matching. The folder also contains 2 project files for each language module, since these are designed as sequential models, with currently "lexrep" and "regex" matching. Each language has its own folder here, for language specific data. 2 subfolders: and _regex represent the models. In these subfolders are again 2 subfolders : ali and lexrep, these represent the state machine data, and are in fact inline tables. This data is generated by our language compiler as part of the build process. Do not edit !

ali

Automatic Language Identification (ali) is part of our IRIS NLP functionality, it allows for multilingual documents to be indexed, or for language identification in a document collection. There is currently no API to use it in the standalone version of iKnow, but it could, since the source code is present.

base

This contains 3 generic modules : IkStringAlg.cpp (functionality for our "String" type), IkStringEncoding.cpp (interface to ICU, basically for utf8 std::string to UCS2 "String" conversions), and a pool allocator for performance.

core

These are the core modules (although some are obsolete), representing the internal classes in use. The main workhorse is IkIndexProcess.cpp, all indexing starts with :

void IkIndexProcess::Start(IkIndexInput* pInput, 
                           IkIndexOutput* pOut, 
                           IkIndexDebug<TraceListType>* pDebug, 
                           bool bMergeRelations, 
                           bool bBinaryMode, 
                           bool delimitedSentences, 
                           size_t max_concept_cluster_length, 
                           IkKnowledgebase* pUdct)

engine

This is the main module for interfacing with clients. 2 subfolders exist : "src" and "language_data". The first has engine.h as the API specification, the second contains again language specific data (anything but state machine data), that results from language model compilation. Do not edit !

enginetest

This has an example of how to interface with the engine. Use enginetest.cpp as a working template for writing your own programs.

shell

This represents the abstraction of a language model, that is derived from every supported language. There used to be 2 versions : "shared memory" and "compiled", but here there's only the "compiled" model. Since "compiled" is derived from "shared memory", both source module exist.

Project files

For Visual Studio 2019 users, 2 files in the modules folder are most important : iKnowEngineTest.sln is the solution file, that will build all modules (28 .dll files and 1 executable). Dependencies.props is used by all project files for ICU reference, edit if necessary to correctly reflect your environment.

If the latter is set correctly, the solution will build all modules, and place all executable code in .\kit\x64\(Debug|Release)\bin.