This repository has been archived by the owner on Nov 30, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f9d8ded
commit 6116294
Showing
33 changed files
with
4,279 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Idyl NLP Entity Query Language | ||
|
||
The Entity Query Language, or EQL, provides a SQL-like syntax for querying entities. EQL provides a means of filtering entities that meet given conditions. This project includes a Pig UDF for using EQL in your Pig jobs. | ||
|
||
## Syntax | ||
|
||
The EQL query `select * from entities` will return all entities from the entity store. | ||
|
||
A where clause can be added to only retrieve entities meeting some condition: | ||
|
||
`select * from entities where text = 'George Washington'` | ||
|
||
This query returns all entities having the text "George Washington." Other queryable fields are confidence, documentId, and context. Multiple fields can be combined with the and keyword. | ||
|
||
`select * from entities where text = 'George Washington' and confidence > 50` | ||
|
||
EQL does not support OR conditionals. Use multiple EQL queries to accomplish an OR condition. EQL queries can be executed through the Idyl E3 API when the entity store is enabled. | ||
|
||
### Example queries | ||
|
||
To find or filter entities with a given text: | ||
|
||
`select * from entities where text = "George Washington"` | ||
|
||
To find or filter entities with a given text in a specific context: | ||
|
||
`select * from entities where text = "George Washington" and context = "book1"` | ||
|
||
#### Queryable Fields | ||
|
||
Now that you see it's a lot like SQL, here are the queryable fields: | ||
|
||
| Field | Description | Examples | Remarks | | ||
| ----- | ----------- | -------- | ------- | | ||
| `id` | The entity's ID. | | | | ||
| `text` | The text of the entity. | "George Washington" | Supports wildcards `*` in the text but not as the first character. | | ||
| `type` | The type of the entity. | "person" | | | ||
| `confidence` | The confidence of the entity - integer values between 0 and 100, inclusive. | 50 | | | ||
| `language` | The language of the entity. | en | | | ||
| `context` | The entity's context. | | | | ||
| `documentId` | The entity's document ID. | | | | ||
| `uri` | The entity's URI. | | | | ||
|
||
#### Paging | ||
|
||
Paging can be achieved using the `limit` and `offset` keywords: | ||
|
||
`select * from entities limit 10 offset 50` | ||
|
||
This query returns the first 10 entities after the first 50 entities. | ||
|
||
The `limit` and `offset` keywords can also be used independently. Note that by default the limit is 25. Use caution when setting large limits. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
<?xml version="1.0"?> | ||
<!-- | ||
Copyright 2019 Mountain Fog, Inc. | ||
Licensed under the Apache License, Version 2.0 (the "License"); you may not | ||
use this file except in compliance with the License. You may obtain a copy | ||
of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
License for the specific language governing permissions and limitations under | ||
the License. | ||
--> | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
<modelVersion>4.0.0</modelVersion> | ||
<parent> | ||
<groupId>ai.idylnlp</groupId> | ||
<artifactId>eql</artifactId> | ||
<version>1.3.0-SNAPSHOT</version> | ||
</parent> | ||
<artifactId>eql-filters</artifactId> | ||
<name>eql-filters</name> | ||
<dependencies> | ||
<dependency> | ||
<groupId>ai.idylnlp</groupId> | ||
<artifactId>eql-language</artifactId> | ||
<version>${project.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>ai.idylnlp</groupId> | ||
<artifactId>idylnlp-model</artifactId> | ||
<version>${project.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.commons</groupId> | ||
<artifactId>commons-collections4</artifactId> | ||
<version>${commons.collections.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>junit</groupId> | ||
<artifactId>junit</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.logging.log4j</groupId> | ||
<artifactId>log4j-core</artifactId> | ||
</dependency> | ||
</dependencies> | ||
</project> |
285 changes: 285 additions & 0 deletions
285
entitydb-eql/eql-filters/src/main/java/ai/idylnlp/eql/filters/EqlFilters.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,285 @@ | ||
/******************************************************************************* | ||
* Copyright 2019 Mountain Fog, Inc. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); you may not | ||
* use this file except in compliance with the License. You may obtain a copy | ||
* of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | ||
* License for the specific language governing permissions and limitations under | ||
* the License. | ||
******************************************************************************/ | ||
/* | ||
* (C) Copyright 2017 Mountain Fog, Inc. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package ai.idylnlp.eql.filters; | ||
|
||
import java.util.ArrayList; | ||
import java.util.Arrays; | ||
import java.util.Collection; | ||
import java.util.Date; | ||
import java.util.LinkedList; | ||
import java.util.List; | ||
|
||
import org.apache.commons.collections4.CollectionUtils; | ||
import org.apache.commons.lang3.StringUtils; | ||
import org.apache.logging.log4j.LogManager; | ||
import org.apache.logging.log4j.Logger; | ||
|
||
import ai.idylnlp.eql.Eql; | ||
import ai.idylnlp.eql.exceptions.QueryGenerationException; | ||
import ai.idylnlp.eql.filters.comparisons.DateComparison; | ||
import ai.idylnlp.eql.model.EntityQuery; | ||
import ai.idylnlp.model.entity.Entity; | ||
|
||
/** | ||
* Static functions for applying EQL statements to entities. | ||
* | ||
* @author Mountain Fog, Inc. | ||
* | ||
*/ | ||
public class EqlFilters { | ||
|
||
private static final Logger LOGGER = LogManager.getLogger(EqlFilters.class); | ||
|
||
/** | ||
* Determines if an entity satisfies (matches) an EQL statement. Note that this function | ||
* internally calls the <code>filterEntities</code> function for evaluation by wrapping | ||
* the entity and the EQL statements in collections. | ||
* @param entity The {@link Entity entity} being tested. | ||
* @param eql The EQL statement. | ||
* @return <code>true</code> if the entity satisfies the EQL statement. Otherwise, <code>false</code>. | ||
* @throws QueryGenerationException Thrown if the EQL statement is malformed. | ||
*/ | ||
public static boolean isMatch(Entity entity, String eql) throws QueryGenerationException { | ||
|
||
Collection<Entity> entities = new ArrayList<Entity>(); | ||
entities.add(entity); | ||
|
||
List<String> eqlStatements = new ArrayList<String>(); | ||
eqlStatements.add(eql); | ||
|
||
Collection<Entity> matchedEntities = filterEntities(entities, eqlStatements); | ||
|
||
// If the collection is NOT empty the entity was matched. | ||
return !(matchedEntities.isEmpty()); | ||
|
||
} | ||
|
||
/** | ||
* Filters date entities. | ||
* Non-date entities are not filtered out. | ||
* @param entities The collection of entities. | ||
* @param date The target date. | ||
* @param dateComparison How to compare the dates. | ||
* @return A filtered collection of date entities. | ||
*/ | ||
public static Collection<Entity> filterEntities(Collection<Entity> entities, Date date, DateComparison dateComparison) { | ||
|
||
Collection<Entity> matchedEntities = new LinkedList<>(); | ||
|
||
for(Entity entity : entities ) { | ||
|
||
if(entity.getType().equals("date")) { | ||
|
||
// The exact milliseconds of the date is stored in the metadata of the date. | ||
String milliseconds = entity.getMetadata().get("time"); | ||
|
||
if(milliseconds != null) { | ||
|
||
Date entityDate = new Date(Long.valueOf(milliseconds)); | ||
|
||
if(dateComparison.equals(DateComparison.BEFORE) && entityDate.before(date)) { | ||
|
||
matchedEntities.add(entity); | ||
|
||
} else if(dateComparison.equals(DateComparison.AFTER) && entityDate.after(date)) { | ||
|
||
matchedEntities.add(entity); | ||
|
||
} else { | ||
|
||
// For some reason this date is missing its milliseconds metadata value. Do not include it since we don't know. | ||
} | ||
|
||
} | ||
|
||
} else { | ||
|
||
// Include it since it is not a date. | ||
matchedEntities.add(entity); | ||
|
||
} | ||
|
||
} | ||
|
||
return matchedEntities; | ||
|
||
} | ||
|
||
/** | ||
* Filters date entities in a date window centered on a target date. | ||
* Non-date entities are not filtered out. | ||
* @param entities The collection of entities. | ||
* @param targetDate The target date. | ||
* @param minutes The size of the window in minutes. | ||
* @param value The size of the window. | ||
* @return A filtered collection of date entities that exist in the given window. | ||
*/ | ||
public static Collection<Entity> filterEntities(Collection<Entity> entities, Date targetDate, int minutes) { | ||
|
||
Date startDate = new Date(targetDate.getTime() - 5*60*1000); | ||
Date endDate = new Date(targetDate.getTime() + 5*60*1000); | ||
|
||
Collection<Entity> matchedEntities = new LinkedList<>(); | ||
|
||
for(Entity entity : entities ) { | ||
|
||
if(entity.getType().equals("date")) { | ||
|
||
// The exact milliseconds of the date is stored in an metadata of the date. | ||
String milliseconds = entity.getMetadata().get("time"); | ||
|
||
if(milliseconds != null) { | ||
|
||
Date entityDate = new Date(Long.valueOf(milliseconds)); | ||
|
||
if(entityDate.after(startDate) && entityDate.before(endDate)) { | ||
|
||
matchedEntities.add(entity); | ||
|
||
} | ||
|
||
} else { | ||
|
||
// For some reason this date is missing its milliseconds metadata value. Do not include it since we don't know. | ||
} | ||
|
||
} else { | ||
|
||
// Include it since it is not a date. | ||
matchedEntities.add(entity); | ||
|
||
} | ||
|
||
} | ||
|
||
return matchedEntities; | ||
|
||
} | ||
|
||
/** | ||
* Filter a collection of entities based on given EQL statements. | ||
* @param entities The collection of {@link Entity entities}. | ||
* @param eqlStatement An EQL statement. | ||
* @return A filtered collection of {@link Entity entities} containing only those | ||
* entities that meet the criteria of at least one EQL statement. | ||
* @throws QueryGenerationException | ||
*/ | ||
public static Collection<Entity> filterEntities(Collection<Entity> entities, String eqlStatement) throws QueryGenerationException { | ||
|
||
return filterEntities(entities, Arrays.asList(eqlStatement)); | ||
|
||
} | ||
|
||
/** | ||
* Filter a collection of entities based on given EQL statements. | ||
* @param entities The collection of {@link Entity entities}. | ||
* @param eqlStatements A list of EQL statements. | ||
* @return A filtered collection of {@link Entity entities} containing only those | ||
* entities that meet the criteria of at least one EQL statement. | ||
* @throws QueryGenerationException | ||
*/ | ||
public static Collection<Entity> filterEntities(Collection<Entity> entities, List<String> eqlStatements) throws QueryGenerationException { | ||
|
||
// A universalMatch is when all entities match the filter. When this happens | ||
// there is no need to check each individual entity. | ||
boolean universalMatch = false; | ||
|
||
Collection<Entity> matchedEntities = new LinkedList<>(); | ||
|
||
if(CollectionUtils.isEmpty(eqlStatements)) { | ||
|
||
// There are no statements so this is a universal match. | ||
universalMatch = true; | ||
|
||
} else { | ||
|
||
for(String eql : eqlStatements) { | ||
|
||
if("select * from entities".equalsIgnoreCase(eql)) { | ||
|
||
universalMatch = true; | ||
|
||
} else { | ||
|
||
EntityQuery entityQuery = Eql.generate(eql); | ||
|
||
if(!universalMatch) { | ||
|
||
for(Entity entity : entities) { | ||
|
||
if(entityQuery.isMatch(entity)) { | ||
|
||
if(passNotConditions(entity, entityQuery)) { | ||
matchedEntities.add(entity); | ||
} | ||
|
||
} | ||
} | ||
|
||
} | ||
|
||
} | ||
|
||
} | ||
|
||
} | ||
|
||
// If it is a universalMatch we return all entities. | ||
// Otherwise we just return the entities that matched at least one EQL statement. | ||
|
||
if(universalMatch) { | ||
|
||
return entities; | ||
|
||
} else { | ||
|
||
return matchedEntities; | ||
|
||
} | ||
|
||
} | ||
|
||
private static boolean passNotConditions(Entity entity, EntityQuery entityQuery) { | ||
|
||
// Determine if the entity passes the NOT conditions of the query. | ||
|
||
if(StringUtils.isNotEmpty(entityQuery.getNotText()) && StringUtils.equals(entity.getText(), entityQuery.getNotText())) return false; | ||
if(StringUtils.isNotEmpty(entityQuery.getNotType()) && StringUtils.equals(entity.getType(), entityQuery.getNotType())) return false; | ||
if(StringUtils.isNotEmpty(entityQuery.getNotContext()) && StringUtils.equals(entity.getContext(), entityQuery.getNotContext())) return false; | ||
if(StringUtils.isNotEmpty(entityQuery.getNotDocumentId()) && StringUtils.equals(entity.getDocumentId(), entityQuery.getNotDocumentId())) return false; | ||
if(StringUtils.isNotEmpty(entityQuery.getNotLanguageCode()) && StringUtils.equals(entity.getLanguageCode(), entityQuery.getNotLanguageCode())) return false; | ||
if(StringUtils.isNotEmpty(entityQuery.getNotUri()) && StringUtils.equals(entity.getUri(), entityQuery.getNotUri())) return false; | ||
|
||
return true; | ||
|
||
} | ||
|
||
} |
Oops, something went wrong.