Skip to content
This repository has been archived by the owner on Nov 30, 2024. It is now read-only.

Commit

Permalink
Adding wiki pages.
Browse files Browse the repository at this point in the history
  • Loading branch information
jzonthemtn committed May 29, 2024
1 parent 7cafa3b commit a400157
Show file tree
Hide file tree
Showing 12 changed files with 694 additions and 1 deletion.
3 changes: 3 additions & 0 deletions entitydb.wiki/Developers-Guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
EntityDB drivers are available on GitHub. Refer to each driver's documentation for usage examples.

* [EntityDB Java Driver](https://github.com/mtnfog/entitydb-java-driver)
46 changes: 46 additions & 0 deletions entitydb.wiki/EQL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
The Entity Query Language, or EQL, is a SQL-like language for querying entities. An EQL query can be executed on the entity store regardless of the underlying database. Because of the databases supported as an entity store has their own query language (Oracle and MySQL have SQL while MongoDB and DynamoDB each have their own syntax), EQL abstracts out the differences in each of their query languages into a single query language that works across any entity store. The EQL query `select * from entities` will return all entities from the entity store.

A where clause can be added to only retrieve entities meeting some condition:

`select * from entities where text = 'George Washington'`

This query returns all entities having the text "George Washington." Other queryable fields are confidence, documentId, and context. Multiple fields can be combined with the and keyword.

`select * from entities where text = 'George Washington' and confidence > 50`

EQL does not support OR conditionals. Use multiple EQL queries to accomplish an OR condition. EQL queries can be executed through the Idyl E3 API when the entity store is enabled.

### Example queries

To find entities with a given text:

`select * from entities where text = "George Washington"`

To find entities with a given text in a specific context:

`select * from entities where text = "George Washington" and context = "book1"`

#### Queryable Fields

Now that you see it's a lot like SQL, here are the queryable fields:

| Field | Description | Examples | Remarks |
| --- | --- | --- |
| `id` | The entity's ID. | | |
| `text` | The text of the entity. | "George Washington" | Supports wildcards `*` in the text but not as the first character. |
| `type` | The type of the entity. | "person" | |
| `confidence` | The confidence of the entity - integer values between 0 and 100, inclusive. | 50 | |
| `language` | The language of the entity. | en | |
| `context` | The entity's context. | | |
| `documentId` | The entity's document ID. | | |
| `uri` | The entity's URI. | | |

#### Paging

Paging can be achieved using the `limit` and `offset` keywords:

`select * from entities limit 10 offset 50`

This query returns the first 10 entities after the first 50 entities.

The `limit` and `offset` keywords can also be used independently. Note that by default the limit is 25. Use caution when setting large limits.
42 changes: 42 additions & 0 deletions entitydb.wiki/Home.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
[[images/entitydb.png]]

1. <a href="#what">What is EntityDB?</a>
1. <a href="#problem">What problems does EntityDB solve?</a>
1. <a href="#principles">What design principles underlie EntityDB?</a>
1. <a href="#contribute">How can I contribute to EntityDB?</a>

<a name="what" />
## What is EntityDB?

EntityDB manages the storing of entities (persons, places, and things) for purposes of querying, analysis, and archival.

<a name="problem" />
## What problems does EntityDB solve?

To make a system to store entities, there must be a data store, the ability to ingest entities, to index entities for fast queries, a means for querying the entities, along with security and audit controls. EntityDB provides these capabilities. Its REST API allows for ingesting and querying entities, and the underlying data store provides entity persistence. The [Entity Query Language (EQL)](https://github.com/mtnfog/entitydb/wiki/EQL) provides a unified language for
querying stored entities no matter what database is used for the data store.

<a name="principles" />
## What design principles underlie EntityDB?

We want EntityDB to meet its goals and give users choices for deployment flexibility. Each component of EntityDB can be swapped for a different implementation. The selectable components are:

* Queue - Entities are queued during ingest before being persisted to prevent entities from being lost. You can choose to use an AWS SQS queue or an ActiveMQ queue. A memory-based internal queue is available for development and testing purposes.
* Data Store - Entities are persisted into an underlying database. You can choose to use MySQL, DynamoDB, MongoDB, or Cassandra as the database. Each database provides different advantages based on your use-case. A memory-based internal data store is available for development and testing.
* Search Index - As entities are ingested they are indexed in a search engine. Currently the only choice for the search engine is Elasticsearch but additional implementations can be created by implementing the `SearchIndex` interface.

The architecture of EntityDB showing these components is below.

[[images/architecture.png]]

<a name="contribute" />
## How can I contribute to EntityDB?

Some ways you can contribute to EntityDB are by:

* Making code changes and additions.
* Editing this wiki and documentation
* Submitting issues you encounter when running EntityDB.
* Testing EntityDB and providing your results.

We welcome EntityDB-related discussions on our [group](https://groups.google.com/forum/#!forum/entitydb) or through [GitHub Issues](https://github.com/mtnfog/entitydb/issues).
97 changes: 97 additions & 0 deletions entitydb.wiki/Quick-Start-Guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
This guide shows how to get EntityDB up and running quickly.

## Requirements

You need the JDK 1.8 and Maven 3. (See the [System Requirements](https://github.com/mtnfog/entitydb/wiki/System-Requirements) for the requirements for a production system.)

## Steps

### Building EntityDB

First, clone the repository:

`git clone https://github.com/mtnfog/entitydb.git`

Change to the cloned directory:

`cd entitydb`

Now build EntityDB:

`mvn clean install`

If you're in a hurry and want to skip tests:

`mvn clean install -DskipTests=true`

### Running EntityDb

When the build is complete, change the directory:

`cd entitydb-app/target`

Run EntityDB.

`java -jar entitydb.jar`

EntityDB will start up. By default, EntityDB's REST API listens on port 8080. You can now connect to EntityDB through one of the open source drivers, `cURL`, or your own client implementation. Edit the `entitydb.properties` to change the components such as the database, queue, or search engine. By default, internal implementations of each component are used.

### Ingesting an Entity

You can ingest an entity:

```
ACL="user:group:1"
curl -vvvv -X POST -H "Content-Type: application/json" -H "Authorization: asdf1234" --data @body.json "http://localhost:8080/api/entity?context=c&documentId=$
```

Where `body.json` is:

```
[{"text":"George Washington","confidence":0.7883777879039289,"span":"[3, 5)","type":"person","enrichments":{},"languageCode":"en"}]
```

When the entity to ingest is received by EntityDB, the entity is placed onto the ingest queue. When the queue is processed (in just a couple of seconds) the entity will persisted to the entity store and subsequently indexed into the search engine.

### Querying the Stored Entities

You can query the stored entities using [EQL](https://github.com/mtnfog/entitydb/wiki/EQL) (`select * from entities`):

`curl -vvvv -H "Authorization: asdf1234" "http://localhost:8080/api/eql?query=select+%2A+from+entities"`

This returns the response:

```
{
"entities": [
{
"acl": {
"groups": [
"group"
],
"users": [
"user"
],
"world": 1
},
"confidence": 0.7883777879039289,
"enrichments": {},
"entityId": "211b03b71c6b42e2495d6065ba9e6b2484b8b9a931b2dc424d50f565d8d8f6ca",
"extractionDate": 1474675422964,
"languageCode": "en",
"text": "George Washington",
"transactionId": 0,
"type": "person"
}
],
"queryId": "57503fd0-dcec-4270-8d95-0964ca5a8b31"
}
```

### Summary

Congratulations! You have just built EntityDB, started it, ingested an entity, and executed a query!

### More Sample Scripts

Additional sample `cURL` scripts are located under the project's [`scripts/`](https://github.com/mtnfog/entitydb/tree/master/scripts) directory.
15 changes: 15 additions & 0 deletions entitydb.wiki/System-Requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
### System Requirements

For a testing environment:

* 2 GB RAM
* Java 8 (OpenJDK or Oracle JDK)

For a production environment:

* 8 GB RAM
* Java 8 (Oracle JDK)
* An underlying database (choose from Apache Cassandra, AWS DynamoDB, MongoDB, and MySQL)
* A cache (choose from local cache or remote Memcached)
* A search engine (currently only supports Elasticsearch)
* A queue (choose from AWS Simple Queue Service (SQS) and Apache ActiveMQ)
Loading

0 comments on commit a400157

Please sign in to comment.