Skip to content
This repository has been archived by the owner on Nov 30, 2024. It is now read-only.

Commit

Permalink
Adding eql project.
Browse files Browse the repository at this point in the history
  • Loading branch information
jzonthemtn committed Mar 21, 2024
1 parent f9d8ded commit 6116294
Show file tree
Hide file tree
Showing 33 changed files with 4,279 additions and 0 deletions.
52 changes: 52 additions & 0 deletions entitydb-eql/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Idyl NLP Entity Query Language

The Entity Query Language, or EQL, provides a SQL-like syntax for querying entities. EQL provides a means of filtering entities that meet given conditions. This project includes a Pig UDF for using EQL in your Pig jobs.

## Syntax

The EQL query `select * from entities` will return all entities from the entity store.

A where clause can be added to only retrieve entities meeting some condition:

`select * from entities where text = 'George Washington'`

This query returns all entities having the text "George Washington." Other queryable fields are confidence, documentId, and context. Multiple fields can be combined with the and keyword.

`select * from entities where text = 'George Washington' and confidence > 50`

EQL does not support OR conditionals. Use multiple EQL queries to accomplish an OR condition. EQL queries can be executed through the Idyl E3 API when the entity store is enabled.

### Example queries

To find or filter entities with a given text:

`select * from entities where text = "George Washington"`

To find or filter entities with a given text in a specific context:

`select * from entities where text = "George Washington" and context = "book1"`

#### Queryable Fields

Now that you see it's a lot like SQL, here are the queryable fields:

| Field | Description | Examples | Remarks |
| ----- | ----------- | -------- | ------- |
| `id` | The entity's ID. | | |
| `text` | The text of the entity. | "George Washington" | Supports wildcards `*` in the text but not as the first character. |
| `type` | The type of the entity. | "person" | |
| `confidence` | The confidence of the entity - integer values between 0 and 100, inclusive. | 50 | |
| `language` | The language of the entity. | en | |
| `context` | The entity's context. | | |
| `documentId` | The entity's document ID. | | |
| `uri` | The entity's URI. | | |

#### Paging

Paging can be achieved using the `limit` and `offset` keywords:

`select * from entities limit 10 offset 50`

This query returns the first 10 entities after the first 50 entities.

The `limit` and `offset` keywords can also be used independently. Note that by default the limit is 25. Use caution when setting large limits.
52 changes: 52 additions & 0 deletions entitydb-eql/eql-filters/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
<?xml version="1.0"?>
<!--
Copyright 2019 Mountain Fog, Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy
of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>ai.idylnlp</groupId>
<artifactId>eql</artifactId>
<version>1.3.0-SNAPSHOT</version>
</parent>
<artifactId>eql-filters</artifactId>
<name>eql-filters</name>
<dependencies>
<dependency>
<groupId>ai.idylnlp</groupId>
<artifactId>eql-language</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>ai.idylnlp</groupId>
<artifactId>idylnlp-model</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-collections4</artifactId>
<version>${commons.collections.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
</dependency>
</dependencies>
</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
/*******************************************************************************
* Copyright 2019 Mountain Fog, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you may not
* use this file except in compliance with the License. You may obtain a copy
* of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations under
* the License.
******************************************************************************/
/*
* (C) Copyright 2017 Mountain Fog, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ai.idylnlp.eql.filters;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Date;
import java.util.LinkedList;
import java.util.List;

import org.apache.commons.collections4.CollectionUtils;
import org.apache.commons.lang3.StringUtils;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import ai.idylnlp.eql.Eql;
import ai.idylnlp.eql.exceptions.QueryGenerationException;
import ai.idylnlp.eql.filters.comparisons.DateComparison;
import ai.idylnlp.eql.model.EntityQuery;
import ai.idylnlp.model.entity.Entity;

/**
* Static functions for applying EQL statements to entities.
*
* @author Mountain Fog, Inc.
*
*/
public class EqlFilters {

private static final Logger LOGGER = LogManager.getLogger(EqlFilters.class);

/**
* Determines if an entity satisfies (matches) an EQL statement. Note that this function
* internally calls the <code>filterEntities</code> function for evaluation by wrapping
* the entity and the EQL statements in collections.
* @param entity The {@link Entity entity} being tested.
* @param eql The EQL statement.
* @return <code>true</code> if the entity satisfies the EQL statement. Otherwise, <code>false</code>.
* @throws QueryGenerationException Thrown if the EQL statement is malformed.
*/
public static boolean isMatch(Entity entity, String eql) throws QueryGenerationException {

Collection<Entity> entities = new ArrayList<Entity>();
entities.add(entity);

List<String> eqlStatements = new ArrayList<String>();
eqlStatements.add(eql);

Collection<Entity> matchedEntities = filterEntities(entities, eqlStatements);

// If the collection is NOT empty the entity was matched.
return !(matchedEntities.isEmpty());

}

/**
* Filters date entities.
* Non-date entities are not filtered out.
* @param entities The collection of entities.
* @param date The target date.
* @param dateComparison How to compare the dates.
* @return A filtered collection of date entities.
*/
public static Collection<Entity> filterEntities(Collection<Entity> entities, Date date, DateComparison dateComparison) {

Collection<Entity> matchedEntities = new LinkedList<>();

for(Entity entity : entities ) {

if(entity.getType().equals("date")) {

// The exact milliseconds of the date is stored in the metadata of the date.
String milliseconds = entity.getMetadata().get("time");

if(milliseconds != null) {

Date entityDate = new Date(Long.valueOf(milliseconds));

if(dateComparison.equals(DateComparison.BEFORE) && entityDate.before(date)) {

matchedEntities.add(entity);

} else if(dateComparison.equals(DateComparison.AFTER) && entityDate.after(date)) {

matchedEntities.add(entity);

} else {

// For some reason this date is missing its milliseconds metadata value. Do not include it since we don't know.
}

}

} else {

// Include it since it is not a date.
matchedEntities.add(entity);

}

}

return matchedEntities;

}

/**
* Filters date entities in a date window centered on a target date.
* Non-date entities are not filtered out.
* @param entities The collection of entities.
* @param targetDate The target date.
* @param minutes The size of the window in minutes.
* @param value The size of the window.
* @return A filtered collection of date entities that exist in the given window.
*/
public static Collection<Entity> filterEntities(Collection<Entity> entities, Date targetDate, int minutes) {

Date startDate = new Date(targetDate.getTime() - 5*60*1000);
Date endDate = new Date(targetDate.getTime() + 5*60*1000);

Collection<Entity> matchedEntities = new LinkedList<>();

for(Entity entity : entities ) {

if(entity.getType().equals("date")) {

// The exact milliseconds of the date is stored in an metadata of the date.
String milliseconds = entity.getMetadata().get("time");

if(milliseconds != null) {

Date entityDate = new Date(Long.valueOf(milliseconds));

if(entityDate.after(startDate) && entityDate.before(endDate)) {

matchedEntities.add(entity);

}

} else {

// For some reason this date is missing its milliseconds metadata value. Do not include it since we don't know.
}

} else {

// Include it since it is not a date.
matchedEntities.add(entity);

}

}

return matchedEntities;

}

/**
* Filter a collection of entities based on given EQL statements.
* @param entities The collection of {@link Entity entities}.
* @param eqlStatement An EQL statement.
* @return A filtered collection of {@link Entity entities} containing only those
* entities that meet the criteria of at least one EQL statement.
* @throws QueryGenerationException
*/
public static Collection<Entity> filterEntities(Collection<Entity> entities, String eqlStatement) throws QueryGenerationException {

return filterEntities(entities, Arrays.asList(eqlStatement));

}

/**
* Filter a collection of entities based on given EQL statements.
* @param entities The collection of {@link Entity entities}.
* @param eqlStatements A list of EQL statements.
* @return A filtered collection of {@link Entity entities} containing only those
* entities that meet the criteria of at least one EQL statement.
* @throws QueryGenerationException
*/
public static Collection<Entity> filterEntities(Collection<Entity> entities, List<String> eqlStatements) throws QueryGenerationException {

// A universalMatch is when all entities match the filter. When this happens
// there is no need to check each individual entity.
boolean universalMatch = false;

Collection<Entity> matchedEntities = new LinkedList<>();

if(CollectionUtils.isEmpty(eqlStatements)) {

// There are no statements so this is a universal match.
universalMatch = true;

} else {

for(String eql : eqlStatements) {

if("select * from entities".equalsIgnoreCase(eql)) {

universalMatch = true;

} else {

EntityQuery entityQuery = Eql.generate(eql);

if(!universalMatch) {

for(Entity entity : entities) {

if(entityQuery.isMatch(entity)) {

if(passNotConditions(entity, entityQuery)) {
matchedEntities.add(entity);
}

}
}

}

}

}

}

// If it is a universalMatch we return all entities.
// Otherwise we just return the entities that matched at least one EQL statement.

if(universalMatch) {

return entities;

} else {

return matchedEntities;

}

}

private static boolean passNotConditions(Entity entity, EntityQuery entityQuery) {

// Determine if the entity passes the NOT conditions of the query.

if(StringUtils.isNotEmpty(entityQuery.getNotText()) && StringUtils.equals(entity.getText(), entityQuery.getNotText())) return false;
if(StringUtils.isNotEmpty(entityQuery.getNotType()) && StringUtils.equals(entity.getType(), entityQuery.getNotType())) return false;
if(StringUtils.isNotEmpty(entityQuery.getNotContext()) && StringUtils.equals(entity.getContext(), entityQuery.getNotContext())) return false;
if(StringUtils.isNotEmpty(entityQuery.getNotDocumentId()) && StringUtils.equals(entity.getDocumentId(), entityQuery.getNotDocumentId())) return false;
if(StringUtils.isNotEmpty(entityQuery.getNotLanguageCode()) && StringUtils.equals(entity.getLanguageCode(), entityQuery.getNotLanguageCode())) return false;
if(StringUtils.isNotEmpty(entityQuery.getNotUri()) && StringUtils.equals(entity.getUri(), entityQuery.getNotUri())) return false;

return true;

}

}
Loading

0 comments on commit 6116294

Please sign in to comment.