This repository has been archived by the owner on Nov 30, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
7cafa3b
commit a400157
Showing
12 changed files
with
694 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
EntityDB drivers are available on GitHub. Refer to each driver's documentation for usage examples. | ||
|
||
* [EntityDB Java Driver](https://github.com/mtnfog/entitydb-java-driver) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
The Entity Query Language, or EQL, is a SQL-like language for querying entities. An EQL query can be executed on the entity store regardless of the underlying database. Because of the databases supported as an entity store has their own query language (Oracle and MySQL have SQL while MongoDB and DynamoDB each have their own syntax), EQL abstracts out the differences in each of their query languages into a single query language that works across any entity store. The EQL query `select * from entities` will return all entities from the entity store. | ||
|
||
A where clause can be added to only retrieve entities meeting some condition: | ||
|
||
`select * from entities where text = 'George Washington'` | ||
|
||
This query returns all entities having the text "George Washington." Other queryable fields are confidence, documentId, and context. Multiple fields can be combined with the and keyword. | ||
|
||
`select * from entities where text = 'George Washington' and confidence > 50` | ||
|
||
EQL does not support OR conditionals. Use multiple EQL queries to accomplish an OR condition. EQL queries can be executed through the Idyl E3 API when the entity store is enabled. | ||
|
||
### Example queries | ||
|
||
To find entities with a given text: | ||
|
||
`select * from entities where text = "George Washington"` | ||
|
||
To find entities with a given text in a specific context: | ||
|
||
`select * from entities where text = "George Washington" and context = "book1"` | ||
|
||
#### Queryable Fields | ||
|
||
Now that you see it's a lot like SQL, here are the queryable fields: | ||
|
||
| Field | Description | Examples | Remarks | | ||
| --- | --- | --- | | ||
| `id` | The entity's ID. | | | | ||
| `text` | The text of the entity. | "George Washington" | Supports wildcards `*` in the text but not as the first character. | | ||
| `type` | The type of the entity. | "person" | | | ||
| `confidence` | The confidence of the entity - integer values between 0 and 100, inclusive. | 50 | | | ||
| `language` | The language of the entity. | en | | | ||
| `context` | The entity's context. | | | | ||
| `documentId` | The entity's document ID. | | | | ||
| `uri` | The entity's URI. | | | | ||
|
||
#### Paging | ||
|
||
Paging can be achieved using the `limit` and `offset` keywords: | ||
|
||
`select * from entities limit 10 offset 50` | ||
|
||
This query returns the first 10 entities after the first 50 entities. | ||
|
||
The `limit` and `offset` keywords can also be used independently. Note that by default the limit is 25. Use caution when setting large limits. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
[[images/entitydb.png]] | ||
|
||
1. <a href="#what">What is EntityDB?</a> | ||
1. <a href="#problem">What problems does EntityDB solve?</a> | ||
1. <a href="#principles">What design principles underlie EntityDB?</a> | ||
1. <a href="#contribute">How can I contribute to EntityDB?</a> | ||
|
||
<a name="what" /> | ||
## What is EntityDB? | ||
|
||
EntityDB manages the storing of entities (persons, places, and things) for purposes of querying, analysis, and archival. | ||
|
||
<a name="problem" /> | ||
## What problems does EntityDB solve? | ||
|
||
To make a system to store entities, there must be a data store, the ability to ingest entities, to index entities for fast queries, a means for querying the entities, along with security and audit controls. EntityDB provides these capabilities. Its REST API allows for ingesting and querying entities, and the underlying data store provides entity persistence. The [Entity Query Language (EQL)](https://github.com/mtnfog/entitydb/wiki/EQL) provides a unified language for | ||
querying stored entities no matter what database is used for the data store. | ||
|
||
<a name="principles" /> | ||
## What design principles underlie EntityDB? | ||
|
||
We want EntityDB to meet its goals and give users choices for deployment flexibility. Each component of EntityDB can be swapped for a different implementation. The selectable components are: | ||
|
||
* Queue - Entities are queued during ingest before being persisted to prevent entities from being lost. You can choose to use an AWS SQS queue or an ActiveMQ queue. A memory-based internal queue is available for development and testing purposes. | ||
* Data Store - Entities are persisted into an underlying database. You can choose to use MySQL, DynamoDB, MongoDB, or Cassandra as the database. Each database provides different advantages based on your use-case. A memory-based internal data store is available for development and testing. | ||
* Search Index - As entities are ingested they are indexed in a search engine. Currently the only choice for the search engine is Elasticsearch but additional implementations can be created by implementing the `SearchIndex` interface. | ||
|
||
The architecture of EntityDB showing these components is below. | ||
|
||
[[images/architecture.png]] | ||
|
||
<a name="contribute" /> | ||
## How can I contribute to EntityDB? | ||
|
||
Some ways you can contribute to EntityDB are by: | ||
|
||
* Making code changes and additions. | ||
* Editing this wiki and documentation | ||
* Submitting issues you encounter when running EntityDB. | ||
* Testing EntityDB and providing your results. | ||
|
||
We welcome EntityDB-related discussions on our [group](https://groups.google.com/forum/#!forum/entitydb) or through [GitHub Issues](https://github.com/mtnfog/entitydb/issues). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
This guide shows how to get EntityDB up and running quickly. | ||
|
||
## Requirements | ||
|
||
You need the JDK 1.8 and Maven 3. (See the [System Requirements](https://github.com/mtnfog/entitydb/wiki/System-Requirements) for the requirements for a production system.) | ||
|
||
## Steps | ||
|
||
### Building EntityDB | ||
|
||
First, clone the repository: | ||
|
||
`git clone https://github.com/mtnfog/entitydb.git` | ||
|
||
Change to the cloned directory: | ||
|
||
`cd entitydb` | ||
|
||
Now build EntityDB: | ||
|
||
`mvn clean install` | ||
|
||
If you're in a hurry and want to skip tests: | ||
|
||
`mvn clean install -DskipTests=true` | ||
|
||
### Running EntityDb | ||
|
||
When the build is complete, change the directory: | ||
|
||
`cd entitydb-app/target` | ||
|
||
Run EntityDB. | ||
|
||
`java -jar entitydb.jar` | ||
|
||
EntityDB will start up. By default, EntityDB's REST API listens on port 8080. You can now connect to EntityDB through one of the open source drivers, `cURL`, or your own client implementation. Edit the `entitydb.properties` to change the components such as the database, queue, or search engine. By default, internal implementations of each component are used. | ||
|
||
### Ingesting an Entity | ||
|
||
You can ingest an entity: | ||
|
||
``` | ||
ACL="user:group:1" | ||
curl -vvvv -X POST -H "Content-Type: application/json" -H "Authorization: asdf1234" --data @body.json "http://localhost:8080/api/entity?context=c&documentId=$ | ||
``` | ||
|
||
Where `body.json` is: | ||
|
||
``` | ||
[{"text":"George Washington","confidence":0.7883777879039289,"span":"[3, 5)","type":"person","enrichments":{},"languageCode":"en"}] | ||
``` | ||
|
||
When the entity to ingest is received by EntityDB, the entity is placed onto the ingest queue. When the queue is processed (in just a couple of seconds) the entity will persisted to the entity store and subsequently indexed into the search engine. | ||
|
||
### Querying the Stored Entities | ||
|
||
You can query the stored entities using [EQL](https://github.com/mtnfog/entitydb/wiki/EQL) (`select * from entities`): | ||
|
||
`curl -vvvv -H "Authorization: asdf1234" "http://localhost:8080/api/eql?query=select+%2A+from+entities"` | ||
|
||
This returns the response: | ||
|
||
``` | ||
{ | ||
"entities": [ | ||
{ | ||
"acl": { | ||
"groups": [ | ||
"group" | ||
], | ||
"users": [ | ||
"user" | ||
], | ||
"world": 1 | ||
}, | ||
"confidence": 0.7883777879039289, | ||
"enrichments": {}, | ||
"entityId": "211b03b71c6b42e2495d6065ba9e6b2484b8b9a931b2dc424d50f565d8d8f6ca", | ||
"extractionDate": 1474675422964, | ||
"languageCode": "en", | ||
"text": "George Washington", | ||
"transactionId": 0, | ||
"type": "person" | ||
} | ||
], | ||
"queryId": "57503fd0-dcec-4270-8d95-0964ca5a8b31" | ||
} | ||
``` | ||
|
||
### Summary | ||
|
||
Congratulations! You have just built EntityDB, started it, ingested an entity, and executed a query! | ||
|
||
### More Sample Scripts | ||
|
||
Additional sample `cURL` scripts are located under the project's [`scripts/`](https://github.com/mtnfog/entitydb/tree/master/scripts) directory. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
### System Requirements | ||
|
||
For a testing environment: | ||
|
||
* 2 GB RAM | ||
* Java 8 (OpenJDK or Oracle JDK) | ||
|
||
For a production environment: | ||
|
||
* 8 GB RAM | ||
* Java 8 (Oracle JDK) | ||
* An underlying database (choose from Apache Cassandra, AWS DynamoDB, MongoDB, and MySQL) | ||
* A cache (choose from local cache or remote Memcached) | ||
* A search engine (currently only supports Elasticsearch) | ||
* A queue (choose from AWS Simple Queue Service (SQS) and Apache ActiveMQ) |
Oops, something went wrong.