Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension of symbol mechanism #34

Open
andreasbaumann opened this issue Apr 25, 2018 · 1 comment
Open

Extension of symbol mechanism #34

andreasbaumann opened this issue Apr 25, 2018 · 1 comment

Comments

@andreasbaumann
Copy link
Collaborator

In the following example:

WORD ^1 : /\b\p{L}+\b/;

KnownName = sequence_imm( firstName = "Hans" WORD, lastName = WORD );

Let's assume we have a list of known first names in a dictionary, so how can we make
sure we match exactly those as symbols in firstName?

The two options I can come up with, don't look too nice:

WORD ^1 : /\b\p{L}+\b/;

KnownName = sequence_imm( firstName = "Albert" WORD, lastName = WORD );
KnownName = sequence_imm( firstName = "Hans" WORD, lastName = WORD );
...
KnownName = sequence_imm( firstName = "Werner" WORD, lastName = WORD );

or even worse:

WORD ^1 : /\b\p{L}+\b/;
FIRST : /\b((Albert)|(Hans)|(Werner))\b/;

KnownName = sequence_imm( firstName = FIRST, lastName = WORD );

Allowing an API there and providing a dictionary for instance allows to filter for known
items.

I could also imagine an interface to a "SELECT first FROM known_customers" into a
database. Of course the question is, how far things should be done in the CLIs like
strusPatternMatcher and when you should use the APIs to do the filtering as a
subcomponent along to database and other processing.

@patrickfrey
Copy link
Owner

A database could be used in a post-filtering of candidates selected by pattern matching.
To query a database in the pattern matcher is not a real option.

The solution to the first problem could be a set of rules for the prename.

KnownFirstName = any( WORD "Albert",
WORD "Hans",
....
WORD "Werner",
WORD "Zora" );
KnownName = sequence_imm(
firstName = KnownFirstName,
lastName = WORD );

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants